Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specifying PropagationPolicy when using deleteExisting #2676

Merged
merged 5 commits into from
Dec 21, 2020

Conversation

bbeaudreault
Copy link
Contributor

@bbeaudreault bbeaudreault commented Dec 17, 2020

Description

As the title says, this is something the API supports but kubernetes-client's generics were just not set up properly to allow crafting such a request. This updates the generics to support that use-case. Added an IT and unit test for the new functionality.

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change
  • Chore (non-breaking change which doesn't affect codebase;
    test, version modification, documentation, etc.)

Checklist

  • Code contributed by me aligns with current project license: Apache 2.0
  • I Added CHANGELOG entry regarding this change
  • I have implemented unit tests to cover my changes
  • I have added/updated the javadocs and other documentation accordingly
  • No new bugs, code smells, etc. in SonarCloud report
  • I tested my code in Kubernetes
  • I tested my code in OpenShift

@centos-ci
Copy link

Can one of the admins verify this patch?

@bbeaudreault
Copy link
Contributor Author

I have a very hard time running kubernetes-client tests locally or in my company's build environment. Will mark this PR ready for review once the checks above have passed.

@bbeaudreault bbeaudreault force-pushed the propagation-policy-delete-existing branch from c255356 to 49d2eae Compare December 17, 2020 16:02
@rohanKanojia
Copy link
Member

Which tests are you referring to? Mock Server tests in kubernetes-tests module or regression tests in kubernetes-itests module?

@bbeaudreault
Copy link
Contributor Author

I can basically never get the project to fully build in IntelliJ, usually due to weird configuration issues with all of the generated models or some other compile issue in one of the various extensions. One major issue is that fabric8's generated classes all go to target/generated-sources directly, rather than to a subdirectory within there. I think they should probably be sent to target/generated-sources/schemagen or something like that. The problem with going directly to generated-sources is that intellij assumes io is the container src dir, not part of the package. This messes up imports. So I have a hard time running any even unit tests within the IDE. That slows things down a lot. Do you have any tips on getting kubernetes-client to build smoothly in intellij?

I can get it to work using mvn clean verify on the command line, but it takes forever and eventually fails in the karaf-itests module. In the past I've tried limiting to just particular modules using -pl, but there is a lot of interdependency so it's like whack-a-mole trying to figure out the write set of modules to include.

I can't run the regression tests in kubernetes-itests, because I don't have a test environment to run them against. My machine is very much set up by my company's configuration scripts to be tightly coupled to our kubernetes environments, and we really dont want all of these actions executing one of our live clusters. I think I'd probably have to do this all in some sort of container setup specifically for fabric8 to ensure it's totally isolated, and I'm not sure how performant that would be.

return createTask.apply(item);
try {
return createTask.apply(item);
} catch (KubernetesClientException e) {
Copy link
Contributor Author

@bbeaudreault bbeaudreault Dec 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is related to my original change to support PropagationPolicy, but my new integration test caught this issue. The default grace period is 30s, so unless you explicitly specify 0 seconds (which we don't here and probably shouldn't if we could) it's possible that the DELETE call returns a 202 instead of 200, meaning it hasn't actually been deleted yet.

BaseOperation/OperationSupport/etc assume that a non-error response means the item was deleted. This is sort of true, but not really since it may have actually been queued for asynchronous deletion (per the 202 response). I looked into whether I could pipe the response Status through as the return value of the various delete() functions, but that proved to be a significant change (and probably a breaking one, since right now we expect to return a simple Boolean).

The simpler fix here seems to be to add a waiting period as I did here. Alternatively we could continually retry createTask for a period of time, but this might be less impactful on API servers?

@bbeaudreault bbeaudreault marked this pull request as ready for review December 17, 2020 18:27
@bbeaudreault
Copy link
Contributor Author

I think this is ready for review. The E2E tests all passed, and the unit tests all pass locally. The builds both failed due to some sort of timeout downloading dependencies. Any chance you can kick off another build?

@rohanKanojia
Copy link
Member

rohanKanojia commented Dec 17, 2020

You're not the first one pointing this out :-( (#2118). This project has grown a lot over time and became quite bulky. I don't know how other manage but I usually load project as maven project in my IntelliJ after mvn clean install -DskipTests(which takes 6-7 minutes). If I have to run/debug some test in IntelliJ, I usually comment out kubernetes-model-generator in root pom.xml and then reload. With this I'm able to run/debug stuff in kubernetes-client

For integration tests, I have to add kubernetes-itests module to my IDE and run it on some local kubernetes cluster(minikube/minishift etc)

@bbeaudreault
Copy link
Contributor Author

Looks like builds are still having issues. Not sure if there's something bigger going on with builds currently

@bbeaudreault
Copy link
Contributor Author

@manusa any chance I can get this PR and #2678 merged and included in the next 5.0.0 release? At that point I'd be happy to reset our internal fork to be based on 5.0.0 and help test there. We make heavy use of CustomResources, so interested to see how the 5.0.0 changes affect that.

await for deletion if the just-deleted resource still exists when trying to create
@bbeaudreault bbeaudreault force-pushed the propagation-policy-delete-existing branch from 63d9f26 to 173ab63 Compare December 17, 2020 21:12
@manusa
Copy link
Member

manusa commented Dec 18, 2020

Do you have any tips on getting kubernetes-client to build smoothly in intellij?

I hear you. It's really painful especially when changing to older branches where module structure is different from the current one.

The following setup works more or less well most of the times.
image

However still need to run manual compilation from time to time or rebuild the project. Again, lots of problems when switching to branches with deprecated modules and vice-versa. Need to manually tune those from time to time.

So any other tip to improve the experience is welcome ;)

@bbeaudreault
Copy link
Contributor Author

Thanks, I'll give that a shot. Any other comments on this PR? It's ready to be merged I think. The build is failing for unrelated reasons, but I can't kick it off myself. It succeeds locally.

@metacosm
Copy link
Collaborator

@bbeaudreault there's a conflict

@bbeaudreault
Copy link
Contributor Author

Whoops, sorry! I'll take care of that shortly

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
@bbeaudreault
Copy link
Contributor Author

bbeaudreault commented Dec 18, 2020

Is it possible to get access to retry these PR checks? They seem flaky and I hate to keep bugging you guys. I know I can force push to kick them off, but I'd rather not have to re-run all of the checks each time, instead just the ones that failed.

The K8S v1.18.6 failed this time, but it has been succeeding consistently before. It failed in a test unrelated to my change.
The Java 11 and 8 builds are in some sort of hung state -- i can't access their logs. I was tailing the java 11 logs before and everything looked healthy. Last I saw, it had made it into the karaf tests. Not sure what is causing it to hang.

@manusa
Copy link
Member

manusa commented Dec 18, 2020

Is it possible to get access to retry these PR checks?

I'm not sure you can through GH interface

Today it's specially bad, Maven Central is also not very reliable today :(

@bbeaudreault
Copy link
Contributor Author

Sounds good. I see someone kicked off a new build, and the E2E all passed 🎉 . Unfortunately the build java tasks are still stuck from the previous run -- they didn't restart.

@manusa
Copy link
Member

manusa commented Dec 18, 2020

All of the build tasks are getting stuck. I think it might have to do with the tests and the new retry behavior.
(we need to add proper timeout configs to our CI workflows BTW)

@rohanKanojia
Copy link
Member

While testing this change, I found out that deletingExisting API is actually broken since v4.13.0. If we try creating an object which doesn't exist in server like this:

Pod pod = client.pods().load(CreateOrReplaceResource.class.getResourceAsStream("/test-pod.yml")).get();
client.resource(pod).inNamespace("default").deletingExisting().createOrReplace();

This would fail with this exception:

Exception in thread "main" io.fabric8.kubernetes.client.KubernetesClientException: Failed to delete existing item:example-pod
	at io.fabric8.kubernetes.client.utils.DeleteAndCreateHelper.deleteAndCreate(DeleteAndCreateHelper.java:51)
	at io.fabric8.kubernetes.client.utils.DeleteAndCreateHelper.deleteAndCreateItem(DeleteAndCreateHelper.java:75)

The same code works without any problems on v4.12.0

I think we should add a check whether resource exists before deleting it here:

public T deleteAndCreate(T item) {
Boolean deleted = deleteTask.apply(item);
if (Boolean.FALSE.equals(deleted)) {
throw new KubernetesClientException("Failed to delete existing item:" + item.getMetadata().getName());
}

It's not really related to this PR since I had done this refactor. Maybe we can tackle this as a part of this PR or in separate issue.

@bbeaudreault
Copy link
Contributor Author

@rohanKanojia I'm happy to add this here. Would you be ok with my just catching and handling the KubernetesClientException, rather than adding a new API call?

@rohanKanojia
Copy link
Member

rohanKanojia commented Dec 18, 2020

I checked and looks like Kubernetes Client only returns false on deletion when an item doesn't exist. So my assumption of throwing exception when delete exists false seems incorrect

} catch (KubernetesClientException e) {
if (e.getCode() != HttpURLConnection.HTTP_NOT_FOUND) {
throw e;
}
return false;

I think we should not throw the exception when deleteTask.apply(item) returns false. We should simply proceed to creation.

@bbeaudreault
Copy link
Contributor Author

bbeaudreault commented Dec 18, 2020

Cool, I'm making that change now

@bbeaudreault
Copy link
Contributor Author

FYI that change was of course easy enough to make, but I realized that my local maven actually wasn't running all of the tests. It was failing on karaf, which I thought was the last one, but actually was missing kubernetes-tests. I excluded kubernetes-tests and noticed that the build is actually hanging locally as well! It appears to be hanging in one of my deleteExisting tests. I'm trying to debug it but it's slow going because I can't run it in intellij, so can't easily debug. Still working through it. There appears to be a 400 client error being thrown in the test which causes one of the getLastRequest() calls to hang forever.

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.
…ted=false, and update tests
@bbeaudreault
Copy link
Contributor Author

bbeaudreault commented Dec 18, 2020

I've pushed the change we discussed, along with a little sanity checking. I also pushed the fix for the hung tests. Sorry for the runaround on that one... the lack of a local test env really got me. I was able to hack around it a bit, hence how i discovered the actual issue. Note that I can't use your suggested "Delegate IDE to maven" setting because there are no 5.0.0-SNAPSHOTs uploaded to maven central. I can't do a mvn install first because of my company's required local maven settings.

I still can't run the full suite in its entirety, so will keep an eye on the checks below and followup with fixes as needed.

@bbeaudreault
Copy link
Contributor Author

Looks like everything passed!

Comment on lines +57 to +58
// depending on the grace period, the object might not actually be deleted by the time we try to create
// if that's the case, give it some time.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still accurate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, still accurate. All of the stuff I mentioned about DELETE returning a 200 or 202 still applies. So we still need to wait for the deletion to occur before creating, otherwise we'll get a conflict.

@manusa manusa requested a review from rohanKanojia December 21, 2020 08:58
Copy link
Member

@manusa manusa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (besides the constant comment). Thx!

@manusa manusa added this to the 5.0.0 milestone Dec 21, 2020
@bbeaudreault
Copy link
Contributor Author

@manusa all set, thanks!

@sonarqubecloud
Copy link

@bbeaudreault
Copy link
Contributor Author

Anything else you guys want to see here? Everything but the code coverage check is passing. I did add tests here, which seem sufficient? Let me know

@manusa manusa merged commit ac1c604 into fabric8io:master Dec 21, 2020
@manusa
Copy link
Member

manusa commented Dec 21, 2020

Thanks a lot @bbeaudreault
I'm going to start the release process of 5.0.0-beta1 as a first step to the 5.0.0 final release. Hopefully your team can provide some feedback on the new API and changes 🙏 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants