Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune PT matrix based on impacted features calculated from GIBs impacted modules #10984

Merged
merged 6 commits into from
Oct 8, 2022

Conversation

MiguelWeezardo
Copy link
Member

@MiguelWeezardo MiguelWeezardo commented Feb 8, 2022

Description

This PR removes unnecessary jobs from the PT matrix based on the impacted features.

General information

This is a performance improvement over starting all PT environments only to have them skip tests later.
This change affects product test runner and GitHub CI jobs.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

@cla-bot cla-bot bot added the cla-signed label Feb 8, 2022
@MiguelWeezardo MiguelWeezardo force-pushed the gib-prune-pt-matrix branch 14 times, most recently from edf9109 to f534b18 Compare February 11, 2022 21:14
@MiguelWeezardo MiguelWeezardo force-pushed the gib-prune-pt-matrix branch 8 times, most recently from e2b53b8 to 76aa5e2 Compare February 18, 2022 10:45
@MiguelWeezardo MiguelWeezardo force-pushed the gib-prune-pt-matrix branch 2 times, most recently from 155983b to 0b148e3 Compare February 23, 2022 15:25
Copy link
Member

@electrum electrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a nice improvement

TEXT(TextOutputBuilder::new),
JSON(JsonOutputBuilder::new);

final Supplier<OutputBuilder> outputBuilderFactory;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make private and add a getter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Getter was not needed since it's only referenced here.

@Override
public void addTestRun(EnvironmentOptions environmentOptions, TestRun.TestRunOptions runOptions, Environment environment)
{
JsonSuite suite = output.getSuites().get(output.getSuites().size() - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterables.getLast()

public class JsonOutput
{
private final String config;
private List<JsonSuite> suites = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a mutable object for serialization isn't how we normally write code in Trino. Instead, make this object immutable, and in the builder, keep track of the config name and list separately, then build the immutable JsonOutput object in the build() call.

{
private static final JsonCodec<JsonOutput> CODEC = new JsonCodecFactory().jsonCodec(JsonOutput.class);

private JsonOutput output;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keep the config name and suite list separate here and build an immutable JsonOutput in the build() call.

this.config = config;
}

@JsonGetter
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use @JsonProperty. Same for the other classes.

@JsonGetter
public List<String> getFeatures()
{
return Stream.of(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only care about these two features?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the only two recognized by TestRun.hasImpactedFeatures() and the only ones that Environments can mark as configured. For any other feature, all tests will always be executed.

logging.debug("Parsing JSON: %s", line)
ptl_output = json.loads(line)
logging.debug("Handling JSON object: %s", ptl_output)
config_features = {(config, suite.get("name")): set([connector for testRun in suite.get("testRuns", []) for connector in testRun["environment"].get("features", [])])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to add some wrapping for readability

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The black Python formatter formats this as:

            config_features = {
                (config, suite.get("name")): set(
                    [
                        connector
                        for testRun in suite.get("testRuns", [])
                        for connector in testRun["environment"].get("features", [])
                    ]
                )
                for suite in ptl_output.get("suites", [])
            }

but I rewrote this into regular loops so it looks like:

            config_features = {}
            for suite in ptl_output.get("suites", []):
                key = (config, suite.get("name"))
                value = set()
                for testRun in suite.get("testRuns", []):
                    for connector in testRun["environment"].get("features", []):
                        value.add(connector)
                config_features[key] = value

I think this is more readable than nested list comprehension.

- 11
exclude:
- config: cdh5
ignore exclusion if: >-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These ignore exclusion if don't seem to be used in the Python code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't have to be explicitly handled. Because we render the matrix yaml in the ci.yml file, the GHA expressions should be evaluated, so there should only be True/False values. All matrix values will be combined when expanding in the expand_matrix function. There are also tests for that.

@nineinchnick nineinchnick force-pushed the gib-prune-pt-matrix branch from 92d7ad8 to 5295cab Compare July 25, 2022 08:03
@nineinchnick
Copy link
Member

@electrum thanks for the review! I addressed all comments but I can't resolve them, PTAL.

@nineinchnick nineinchnick force-pushed the gib-prune-pt-matrix branch 2 times, most recently from 7f0b2b9 to 9896107 Compare August 24, 2022 10:33
@MiguelWeezardo
Copy link
Member Author

MiguelWeezardo commented Aug 30, 2022

@electrum Could you take another look? Is this merge-worthy?

This will prevent us from accidentally commiting stale versions.
Download JAR used by ptl

Make sure errors in ptl suite describe don't prune entire PT matrix

Fix fetching of impacted connectors
Instead, unnecessary PTs are:
* added to the exclude list,
* filtered from the include list

The matrix is then expanded by GitHub's own code.
@MiguelWeezardo
Copy link
Member Author

@electrum Could you take another look at this? I'd like to get this PR merged, or at least get some feedback on what is missing.

@MiguelWeezardo MiguelWeezardo requested review from Praveen2112 and removed request for martint, wendigo, kokosing and bitweld October 6, 2022 09:48
@electrum electrum merged commit c43967a into trinodb:master Oct 8, 2022
@github-actions github-actions bot added this to the 400 milestone Oct 8, 2022
@nineinchnick nineinchnick mentioned this pull request Oct 17, 2022
@MiguelWeezardo MiguelWeezardo deleted the gib-prune-pt-matrix branch October 18, 2022 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed dx Issues or pull requests related to developer experience
Development

Successfully merging this pull request may close these issues.

5 participants