Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix computation of changed files in generate-changed-only rule #665

Merged
merged 3 commits into from
Dec 16, 2019

Conversation

jlewi
Copy link
Contributor

@jlewi jlewi commented Dec 12, 2019

Fix computation of changed files in generate-changed-only rule

  • use git diff --name-only @{upstream} to compute the diff against the
    upstream branch. This should be better then the current branch
    which is making assumptions about the remote repo names.

  • Furthermore we need to make sure that when the base kustomization package
    changes that we also regenerate the tests for the overlay packages.

    • to support that we replace gen-test-targets.sh with a python script.

    • The bash scripts are pretty impenetrable; migrating to python should
      make the code easier to maintain.

  • The name of the go test files generated is slightly different from what
    the shell scripts were generating.

    • This is intended to make the naming more consistent

    • specifically a/b/c/kustimazation.yaml results in
      tests/a-b-c_test.go

    • It looks like the shell script was sometimes not including a in the name.

  • The python script also checks that for _test.go file there is a corresponding
    kustomization.yaml file; otherwise it will remove the test. This
    ensures if we move or remove a kustomize package we will end up
    removing the test.

  • Fix: make generate - assumptions about remote names #509

  • Update the pull request template with the command to only generate
    tests for changed files

  • Fix gen-test-targets should function regardless of checkout name #171 - gen-test-target.sh should function regardless
    of checked out name for the repository. We can use git to get the
    base directory and then do an appropriate string replace.

  • Need to update the test target generation
    to not assume the repository is named manifests


This change is Reviewable

@jlewi
Copy link
Contributor Author

jlewi commented Dec 12, 2019

/assign @kkasravi

@jlewi
Copy link
Contributor Author

jlewi commented Dec 12, 2019

/hold

@kkasravi make generate-changed-only doesn't seem to correctly update the tests. Specifically I did the following

  1. Modify profiles/base/kustomization.yaml - change the image
  2. Run make generate-changed-only
  3. Run make test

The tests fail

I think the problem is that make generate-changed-only ends up only changing the test for the profiles base package and not any of the test files corresponding to overlays.

Is that a problem with my PR or is that an existing issue with make generate-changed-only ?

@jlewi
Copy link
Contributor Author

jlewi commented Dec 12, 2019

/assign @gabrielwen
/hold cancel

@gabrielwen @kkasravi This is ready for review.

Copy link
Contributor

@krishnadurai krishnadurai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @jlewi ! This has been a low hanging fruit for quite some time.

I've left a few comments to consider. PTAL.

import os
import subprocess

TOP_LEVEL_EXCLUDES = ["docs"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we include 'hack' and 'tests'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.github/pull_request_template.md Outdated Show resolved Hide resolved
# Generate a list of the files which have changed with respect to the upstream
# branch
modified_files = subprocess.check_output(
["git", "diff", "--name-only", "@{upstream}"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking if it would be better to have a config file which can be populated with the upstream repo's name? If the config is empty - we could prompt the user to set it.

I just tried this out with my workflow:

git checkout -b branchname
git commit
git push personal_fork branch-name

There's no default upstream value set and I run into this error:

fatal: no upstream configured for branch 'current_branch'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would prompting the user be different from what happens now? i.e. you run it, see an error about no upstream branch; you set the upstream and then rerun it.

Wouldn't having a config file be duplicating what git is doing? i.e. git is already giving you a way to configure the upstream branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify here: is the aim to generate the diff from the last upstream commit in the working branch?
If this is the case, a developer could forget to generate tests and push their changes upstream. The diffs wouldn't reflect the committed changes and the required tests might not get generated.

Alternatively, should we always compare against upstream/master assuming it is almost always updated?
This way, we avoid the branch locality of 'upstream' configuration changing and we can set our local configuration to upstream/master which will be independent of the branch which we are working on.

Please correct me if I'm mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the case, a developer could forget to generate tests and push their changes upstream. The diffs wouldn't reflect the committed changes and the required tests might not get generated.

upstream should be kubeflow/manifests users should not be able to push changes to that commit because of an error.

Alternatively, should we always compare against upstream/master assuming it is almost always updated?

Isn't that what the code is doing? The code is using a diff whatever the user set as the upstream branch to determine what files are changed in the PR and if so generate the changed files.

My expectation is users set the upstream branch to kubeflow/manifests; not their personal fork of the kubeflow/manifests.

See for example
https://hackernoon.com/sync-a-fork-from-upstream-repo-in-github-c2c29c8eca3b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks.

changed_dirs = set()
for top in os.listdir(root):
if top.startswith("docs"):
print("donotsubmit")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we log instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. This is debug code I'll remove in my next commit.

side bar: I'm really hoping there is a github action that we can use to automatically catch donotsubmit in code.

test_target_name = test_target_name.replace("-", "")
with open(test_path, "w") as test_file:
subprocess.check_call(["./hack/gen-test-target.sh", full_dir,
test_target_name],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not able to understand the use of this variable 'test_target_name'. Could you please explain?

gen-test-target uses only 1 argument as of now, if I'm not wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. Its a bug; will revert in my next commit.

@krishnadurai
Copy link
Contributor

Unrelated to this PR:

Another thing which I'm considering is to change gen-test-target.sh. Right now gen-test-target.sh interprets the resources from kustomization.yaml before copying over the files to a test folder.

Should we interpret kustomization.yaml using this line?

for i in $(echo $(cat $directory/kustomization.yaml | grep '^- .*yaml$' | sed 's/^- //') $(cat $directory/kustomization.yaml | grep ' path: ' | sed 's/^.*: \(.*\)$/\1/') $(cat $directory/kustomization.yaml | sed '1,/^[ \t]*files:/d;/^[^ \t]/,$d' | sed 's/^[ \t]*- //') params.env secrets.env kustomization.yaml | sed 's/ /\\n/g' | sort | uniq | awk '{gsub(/\\n/,"\n")}1'); do

An alternative would be to generate outputs from the existing kustomize folder and comparing newer changes to kustomize against the older outputs.

Could you please explain if the current method has more benefits than the alternative suggested?

/cc @kkasravi @jlewi

@k8s-ci-robot
Copy link
Contributor

@krishnadurai: GitHub didn't allow me to request PR reviews from the following users: jlewi.

Note that only kubeflow members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

Unrelated to this PR:

Another thing which I'm considering is to change gen-test-target.sh. Right now gen-test-target.sh interprets the resources from kustomization.yaml before copying over the files to a test folder.

Should we interpret kustomization.yaml using this line?

for i in $(echo $(cat $directory/kustomization.yaml | grep '^- .*yaml$' | sed 's/^- //') $(cat $directory/kustomization.yaml | grep ' path: ' | sed 's/^.*: \(.*\)$/\1/') $(cat $directory/kustomization.yaml | sed '1,/^[ \t]*files:/d;/^[^ \t]/,$d' | sed 's/^[ \t]*- //') params.env secrets.env kustomization.yaml | sed 's/ /\\n/g' | sort | uniq | awk '{gsub(/\\n/,"\n")}1'); do

An alternative would be to generate outputs from the existing kustomize folder and comparing newer changes to kustomize against the older outputs.

Could you please explain if the current method has more benefits than the alternative suggested?

/cc @kkasravi @jlewi

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 13, 2019

@krishnadurai whatever you do; please; please no more bash :)

The line of code

for i in $(echo $(cat $directory/kustomization.yaml | grep '^- .*yaml$' | sed 's/^- //') $(cat $directory/kustomization.yaml | grep ' path: ' | sed 's/^.*: \(.*\)$/\1/') $(cat $directory/kustomization.yaml | sed '1,/^[ \t]*files:/d;/^[^ \t]/,$d' | sed 's/^[ \t]*- //') params.env secrets.env kustomization.yaml | sed 's/ /\\n/g' | sort | uniq | awk '{gsub(/\\n/,"\n")}1'); do

is pretty unreadable/impenetrable; @krishnadurai kudos to for figuring it out.

See also #306

def remove_unmatched_tests(repo_root, package_dirs):
"""Remove any tests that don't map to a kustomization.yaml file.

This ensures tests don't linger if a pakage is deleted.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pakage -> package

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

logging.info("Ignoring %s", name)
continue

if not name in expected_tests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we handle the case where a manifest has been deleted so the unit tests no longer apply?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kkasravi This code should ensure that if a/b/c/kustomization.yaml doesn't exist then the test a-b-c_test.go should not exist. So I believe that covers the use case you described or did I miss something?

Jeremy Lewi added 3 commits December 16, 2019 11:59
* use git diff --name-only @{upstream} to compute the diff against the
  upstream branch. This should be better then the current branch
  which is making assumptions about the remote repo names.

* Furthermore we need to make sure that when the base kustomization package
  changes that we also regenerate the tests for the overlay packages.

  * to support that we replace gen-test-targets.sh with a python script.

  * The bash scripts are pretty impenetrable; migrating to python should
    make the code easier to maintain.

* The name of the go test files generated is slightly different from what
  the shell scripts were generating.

  * This is intended to make the naming more consistent

  * specifically  a/b/c/kustimazation.yaml results in
    tests/a-b-c_test.go

  * It looks like the shell script was sometimes not including a in the name.

* The python script also checks that for _test.go file there is a corresponding
  kustomization.yaml file; otherwise it will remove the test. This
  ensures if we move or remove a kustomize package we will end up
  removing the test.

* Fix: kubeflow#509

* Update the pull request template with the command to only generate
  tests for changed files

* Fix kubeflow#171 - gen-test-target.sh should function regardless
  of checked out name for the repository. We can use git to get the
  base directory and then do an appropriate string replace.

* Need to update the test target generation
  to not assume the repository is named manifests

* Update the github pull_request template to tell users to run `make generate-changed-only`
Copy link
Contributor

@krishnadurai krishnadurai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

# Generate a list of the files which have changed with respect to the upstream
# branch
modified_files = subprocess.check_output(
["git", "diff", "--name-only", "@{upstream}"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks.

@jlewi
Copy link
Contributor Author

jlewi commented Dec 16, 2019

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jlewi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

make generate - assumptions about remote names gen-test-targets should function regardless of checkout name
6 participants