Test Che6 to Che7 upgrade using che-operator #14200

skabashnyuk · 2019-08-12T14:13:21Z

Is your task related to a problem? Please describe.

Install Che 6.19 and then attempt to update it to 7.0 using an operator.
Need to check

Make sure user db & workspace definitions in postgres, successfully upgrade the Che server to 7.0
see if anything breaks... variables that got renamed? routes that are different? assumptions/APIs in Che 7 that changed?

nickboldt · 2019-08-12T14:14:06Z

Downstream: https://issues.jboss.org/browse/CRW-303

metlos · 2019-09-16T20:41:39Z

I tested the upgrade on both K8s (using minikube 1.3.1) and Openshift 3.11 (using minishift).

I've found the following minor problems:

Keycloak doesn't upgrade from che-keycloak:6.19.0 to che-keycloak:7.1.0 on operator upgrade. This seems to be caused by the explicitly mentioned Keycloak image version in the CR config.
When upgrading from che operator 0.5.0 (commit eclipse-che/che-operator@32b8e15) which is the last operator with a non-CRW version of Che6 to operator 7.1.0, the configuration of Keycloak looses the original DB password and tries to use a new generated one leading to failing upgrade, because Keycloak cannot start.

@davidfestal do you think those are problems worth creating issues for?

skabashnyuk · 2019-09-17T05:56:26Z

those are problems worth creating issues for

CC @l0rd @davidfestal wdyt do we need to upgrade version of keycloak for already exists CheCluste installations?

metlos · 2019-09-17T13:23:51Z

those are problems worth creating issues for

CC @l0rd @davidfestal wdyt do we need to upgrade version of keycloak for already exists CheCluste installations?

I've created #14575 for that.

davidfestal · 2019-09-17T20:16:30Z

Sorry for the time before answering @metlos

I would answer differently for these 2 cases:

Let me start with the second one:

2.When upgrading from che operator 0.5.0 (commit eclipse-che/che-operator@32b8e15) which is the last operator with a non-CRW version of Che6 to operator 7.1.0, the configuration of Keycloak looses the original DB password and tries to use a new generated one leading to failing upgrade, because Keycloak cannot start.

That seems a backward-compatibility bug to me. If there would be some real possibility (and compelling reason) that a user would still use this old che operator version, then we should look deeper into the root cause of this and possibly fix it.

Keycloak doesn't upgrade from che-keycloak:6.19.0 to che-keycloak:7.1.0 on operator upgrade. This seems to be caused by the explicitly mentioned Keycloak image version in the CR config.

This is an already-known issue that was still there in previous versions of Che 7. This has been fixed in Che 7.1.0 for all upcoming upgrades (7.1.0 to next releases) and also specifically for the upgrades from the 7.0.0 to 7.1.0, since we knew that 7.0.0 was the last version with this bug (see PR eclipse-che/che-operator#72 (comment)).

That's not possible to fix this for upgrades from any older version to 7.1.0 since users also might want to keep fixed versions into their CheCluster resources.

2 cases here:

if we plan to use script for migration of CRW 1.2.x operator to 2.0, then we should be able to clean the frozen docker image versions in the CheCluster (set them back to empty string for Keycloak, Postgres, etc ...) just after upgrading the operator to 7.1.0.
if we plan to let OperatorHub + marketplace do the upgrade of the operator, then the user should have to manually clean the frozen docker image versions in the CheCluster (set them back to empty string for Keycloak, Postgres, etc ...) just after upgrading the operator to 7.1.0.

@nickboldt @l0rd Does it seem an acceptable solution ?

If it is not acceptable, then we would need to implement the same type of automatic fix as we did for the 7.0.0 to 7.1.0 release:

identify a single CRW 1.2.x release that is the one most CRW users would have (probably the last release one)=> considered the same as the default one
automatically clear the docker images in the CheClusteras soon as they correspond to the docker image of this single CRW release.

metlos · 2019-09-23T09:36:07Z

Discovered another issue - #14630. This is not strictly an issue with upgrade of the operator though. The upgrade of the operator to 7.1.0 followed the redeployment of the che server seems to work. It only happens when the user makes changes to the CR of the che cluster.

metlos · 2019-09-23T10:04:09Z

After all, this affects 1 upgrade scenario:

Using Che6 based operator, install che cluster CR specifically mentioning the image tag of Che to install
Upgrade the operator to Che7 based one.
The Che server deployment doesn't change because the image tag is explicitly configured.
Now you need to manually change the tag to some 7.x version, which triggers Operator regenerates DB password without redeploying Postgres #14630.

davidfestal · 2019-09-24T08:44:07Z

@nickboldt @l0rd ld you please provide your thoughts about the questions in comment #14200 (comment) ?

This would be great to conclude on this and decide if any action item is required, before closing the analysis issue.

davidfestal · 2019-09-24T14:56:57Z

After all, this affects 1 upgrade scenario:

1. Using Che6 based operator, install che cluster CR specifically mentioning the image tag of Che to install

2. Upgrade the operator to Che7 based one.

3. The Che server deployment doesn't change because the image tag is explicitly configured.

4. Now you need to manually change the tag to some 7.x version, which triggers #14630.

@metlos Do you confirm that if you manually change the image tags in the custom resource by patching the CR existing in K8S instead of applying it again, upgrade works correctly ?

The main reason of my question is that, when you apply the changed CR again from the yaml file you have locally, you override the Postgres and Keycloak passwords to empty strings. This triggers the generation of new passwords by the operator, and brings bug #14630.

So to summarize, bug #14630 would only impact upgrade from 6 to 7 if the CR is changed fully, instead of only patching the docker image and tag fields.

metlos · 2019-09-24T16:04:37Z

@metlos Do you confirm that if you manually change the image tags in the custom resource by patching the CR existing in K8S instead of applying it again, upgrade works correctly ?

Yes, patching the CR seems to be working well.

l0rd · 2019-10-01T16:30:15Z

@metlos @davidfestal the keycloak image version problem can be definitely handled as a doc issue (hence we need to add the workaround in the doc to close this issue).

nickboldt · 2019-10-01T17:34:31Z

if we plan to use script for migration of CRW 1.2.x operator to 2.0, then we should be able to clean the frozen docker image versions in the CheCluster (set them back to empty string for Keycloak, Postgres, etc ...) just after upgrading the operator to 7.1.0.

I'm hoping to not have to handle scripting an update process, but if we have to we can do so as documentation, so it's not tightly tied to a release.

if we plan to let OperatorHub + marketplace do the upgrade of the operator, then the user should have to manually clean the frozen docker image versions in the CheCluster (set them back to empty string for Keycloak, Postgres, etc ...) just after upgrading the operator to 7.1.0.

This sounds preferrable, except that we still have to support OCP 3.11 users. So would this mean we would tell them to install OLM in order to be able to use operator-based updates?

davidfestal · 2019-10-07T16:10:30Z

Now you need to manually change the tag to some 7.x version

@metlos Instead if patching the custom resource with some Che 7.x version, you should just reset the docker image to empty string, so that it takes the defaults.

Did you do this for all the following components:

Keycloak (I saw you did)
Postgres
the PVC jobs docker image ?

I'm asking that because it would be important to have tested the upgrade of the Postgres image as well (if it needs upgrade).

nickboldt · 2019-10-07T16:12:53Z

+1, odds are good that with every CRW update we'll have new Keycloak, UBI8-minimal (PVC) and Postgres images because the hamster wheel of CVE fixes never stops. :)

metlos · 2019-10-07T21:28:46Z

I have not tested Postgres for sure because it hasn't changed between the versions for neither Che nor CRW. Not sure about PVC.

skabashnyuk · 2019-10-08T13:33:54Z

The scope of this task defined in the description is complete.
Next steps

Explain pitfalls and workarounds of Che6 to Che7 upgrade in a doc Explain pitfalls and workarounds of Che6 to Che7 upgrade in a doc #14738
Test UX of an upgrade to Che7 from the running Che6 workspace perspective Test UX of an upgrade to Che7 from the running Che6 workspace perspective #14810

I would like to ask no to extend the scope of this issue, instead of that create a new one.

sleshchenko · 2019-10-08T14:08:00Z

I've tested the following scenario:
I deployed Che with chectl and operator on OpenShift 4.2. Che operator image is: quay.io/eclipse/che-operator:nightly.
It created CR where some fields, like postgres image, is empty by default.
Then I made sure that all images have empty values (che server, postgres, keycloak, plugin-registry, devfile-registry, pvcJobimage):

apiVersion: org.eclipse.che/v1
kind: CheCluster
metadata:
  creationTimestamp: 2019-10-08T13:32:47Z
  finalizers:
  - oauthclients.finalizers.che.eclipse.org
  generation: 19
  name: eclipse-che
  namespace: che
  resourceVersion: "21659"
  selfLink: /apis/org.eclipse.che/v1/namespaces/che/checlusters/eclipse-che
  uid: 1daebd0e-e9d0-11e9-803c-12a6fdf53914
spec:
  auth:
    # ...    identityProviderImage: ""
    # ...
  database:
    # ...
    postgresImage: ""
    # ...
  server:
    # ...
    cheImage: ""
    cheImagePullPolicy: ""
    cheImageTag: ""
    # ...
    devfileRegistryImage: ""
    # ...
    pluginRegistryImage: ""
    # ...
  storage:
    # ...    pvcJobsImage: ""
    # ...

After che-operator report that Che is available again we have the following state:

Plugin Registry: quay.io/eclipse/che-plugin-registry:7.2.0
Devfile Registry: quay.io/eclipse/che-devfile-registry:7.2.0
Keycloak: eclipse/che-keycloak:7.2.0
Postgres: centos/postgresql-96-centos7:9.6
Che: eclipse/che-server:7.2.0
CHE_INFRA_KUBERNETES_PVC_JOBS_IMAGE: 'registry.access.redhat.com/ubi8-minimal:8.0-213'

metlos · 2019-10-08T15:43:42Z

After che-operator report that Che is available again we have the following state:

I think we should test this:

Add custom tags for all the above mentioned images
Configure the CR to use those custom tags
After Che is redeployed with these new images
Set the image properties back to empty strings
The operator should use the default values for the images again.

It is basically what you have done with the difference that we're not creating new deployments but rolling over existing ones.

@davidfestal does the above make sense or do you think it is superfluous?

nickboldt · 2019-10-09T18:17:49Z

+1 to retesting with rolling updates.

Re: postgres updates...

I have not tested Postgres for sure because it hasn't changed between the versions for neither Che nor CRW. Not sure about PVC.

Two problems with that statement:

a) CRW 1.2.2 is using registry.redhat.io/rhscl/postgresql-96-rhel7:1-46 and 2.0 will be using registry.redhat.io/rhscl/postgresql-96-rhel7:1-47 (or newer) so you CAN test a simple version change.

b) even with the upstream Che 6 -> 7 version centos/postgresql-96-centos7:9.6 is a moving tag which represents the latest set of applied CVE fixes. So :9.6 from 2 months ago is a different version from today. You can see this from the tags in the repo which are 5 or 13 days old:

https://hub.docker.com/r/centos/postgresql-96-centos7/tags?page=1&ordering=last_updated

So... the flaw here is the approach that you're always using a moving tag, instead of a static one.

Re: "not sure about PVC", we can again test an update there:

1.2.2 / 6.19.x: registry.access.redhat.com/ubi8-minimal:8.0-159
vs.
2.0 / 7.3 / master: registry.access.redhat.com/ubi8-minimal:8.0-213

So... reopening. Task not complete. @metlos can you investigate the two missing update path scenarios?

davidfestal · 2019-10-09T19:16:38Z

I think we should test this:
1. Add custom tags for all the above mentioned images

What do you mean precisly by this step ? You could simply fix old tags of each images.

2. Configure the CR to use those custom tags

I would simply create the CR with these custom (old) tags at start.

3. After Che is redeployed with these new images

s/redeployed/deployed

4. Set the image properties back to empty strings

5. The operator should use the default values for the images again.

And it should have triggered the rolling update of all the components (apart from Postgres that will be restarted instead of rolling-updated)

It is basically what you have done with the difference that we're not creating new deployments but rolling over existing ones.

@davidfestal does the above make sense or do you think it is superfluous?

I think it can be interesting, especially if you use it with running workspaces.

Another point, you might be able to set the cheFlavor to codeready when creating the initial custom resource. So it should use the CRW images. Then you would be able to really test CRW update from 1.2 to 2.0 (just look at the default.go file in the che-operator 1.x branch at the 1.2.0 tag to know what images will be used by default in CRW 1.2).
Of course to download CRW docker images, you will need the subscription secret to access the private registry. @rhopp and @nickboldt might help for that.

metlos · 2019-10-10T06:45:51Z

Let me reply to all of the questions raised and swiftly close this issue again.

CRW has nada to do with this issue that is about upgrade of Che6 to Che7. If you need to test upgrades of CRW, you're of course free to create another issue which will go through the normal loops.

Sorry if I sound harsh but let's just not extend the scope of this issue constantly and return back to "defined scope -> work -> done -> problems found -> new issue" so that we can actually make progress.

CRW 1.2.2 is using registry.redhat.io/rhscl/postgresql-96-rhel7:1-46 and 2.0 will be using registry.redhat.io/rhscl/postgresql-96-rhel7:1-47 (or newer) so you CAN test a simple version change.

How on earth is the platfrom team supposed to that? at the time we tested this, operator on master still used the old image. If that's not gonna be the case, then let's open a new issue.

b) even with the upstream Che 6 -> 7 version centos/postgresql-96-centos7:9.6 is a moving tag which represents the latest set of applied CVE fixes. So :9.6 from 2 months ago is a different version from today. You can see this from the tags in the repo which are 5 or 13 days old:

which still resolves to a no-op with regards to testing the upgrade itself. Are we supposed to test kubernetes now?

So... the flaw here is the approach that you're always using a moving tag, instead of a static one.

Sure, let's create a new issue for that. Nothing to do with testing the upgrade.

Re: "not sure about PVC", we can again test an update there:

1.2.2 / 6.19.x: registry.access.redhat.com/ubi8-minimal:8.0-159
vs.
2.0 / 7.3 / master: registry.access.redhat.com/ubi8-minimal:8.0-213

So... reopening. Task not complete. @metlos can you investigate the two missing update path scenarios?

I think you can guess my sentiment on this. No, I will not do that in the scope of this issue.

Another point, you might be able to set the cheFlavor to codeready when creating the initial custom resource. So it should use the CRW images. Then you would be able to really test CRW update from 1.2 to 2.0 (just look at the default.go file in the che-operator 1.x branch at the 1.2.0 tag to know what images will be used by default in CRW 1.2).

Let's create another issue for this. This issue is about Che, not CRW. I would also argue the platform team of Che is not responsible for testing CRW.

nickboldt · 2019-10-11T16:46:30Z

As requested followup work moved to 2 new issues:

Verify Che 6 to Che 7 upgrade using operator works for all updated containers #14864 (Che 6.19.x to 7.x operator update)
Verify CRW 1.2 to 2.0 upgrade using operator works for all updated containers #14865 (CRW 1.2 to 2.0 operator update)

skabashnyuk added kind/task Internal things, technical debt, and to-do tasks to be performed. status/open-for-dev An issue has had its specification reviewed and confirmed. Waiting for an engineer to take it. team/platform labels Aug 12, 2019

skabashnyuk added this to the 7.1.0 milestone Aug 12, 2019

skabashnyuk mentioned this issue Aug 14, 2019

Platform-2019-09-03 (Sprint: 171) #14146

Closed

16 tasks

skabashnyuk mentioned this issue Aug 21, 2019

Need to verify how Che 6 -> Che 7 workspace migration works on prod-preview (7.0.0 GA) before updating prod redhat-developer/rh-che#1568

Closed

metlos self-assigned this Aug 23, 2019

metlos added status/in-progress This issue has been taken by an engineer and is under active development. and removed status/open-for-dev An issue has had its specification reviewed and confirmed. Waiting for an engineer to take it. labels Aug 23, 2019

skabashnyuk added the severity/P1 Has a major impact to usage or development of the system. label Sep 2, 2019

skabashnyuk mentioned this issue Sep 4, 2019

Platform-2019-09-24 (Sprint: 172) #14342

Closed

15 tasks

skabashnyuk modified the milestones: 7.1.0, 7.2.0 Sep 5, 2019

metlos removed the status/in-progress This issue has been taken by an engineer and is under active development. label Sep 16, 2019

davidfestal mentioned this issue Sep 17, 2019

Keycloak doesn't upgrade from che-keycloak:6.19.0 to che-keycloak:7.1.0 on operator upgrade. #14575

Closed

17 tasks

skabashnyuk mentioned this issue Sep 25, 2019

Platform-2019-10-15 (Sprint: 173) #14602

Closed

17 tasks

l0rd mentioned this issue Oct 1, 2019

Add offline flag to chectl server:start #14699

Closed

8 tasks

skabashnyuk mentioned this issue Oct 2, 2019

Explain pitfalls and workarounds of Che6 to Che7 upgrade in a doc #14738

Closed

skabashnyuk modified the milestones: 7.2.0, 7.3.0 Oct 2, 2019

skabashnyuk mentioned this issue Oct 8, 2019

Test UX of an upgrade to Che7 from the running Che6 workspace perspective #14810

Closed

skabashnyuk closed this as completed Oct 8, 2019

davidfestal mentioned this issue Oct 8, 2019

Add some known issues when upgrading from Che6 using the operator. eclipse-che/che-docs#837

Merged

sleshchenko mentioned this issue Oct 9, 2019

[chectl] Add support for updating from Che 6 -> Che 7 #14793

Closed

4 tasks

nickboldt reopened this Oct 9, 2019

metlos closed this as completed Oct 10, 2019

This was referenced Oct 11, 2019

Verify Che 6 to Che 7 upgrade using operator works for all updated containers #14864

Closed

Verify CRW 1.2 to 2.0 upgrade using operator works for all updated containers #14865

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Che6 to Che7 upgrade using che-operator #14200

Test Che6 to Che7 upgrade using che-operator #14200

skabashnyuk commented Aug 12, 2019

nickboldt commented Aug 12, 2019

metlos commented Sep 16, 2019

skabashnyuk commented Sep 17, 2019

metlos commented Sep 17, 2019

davidfestal commented Sep 17, 2019

metlos commented Sep 23, 2019 •

edited

Loading

metlos commented Sep 23, 2019

davidfestal commented Sep 24, 2019

davidfestal commented Sep 24, 2019 •

edited

Loading

metlos commented Sep 24, 2019

l0rd commented Oct 1, 2019

nickboldt commented Oct 1, 2019

davidfestal commented Oct 7, 2019

nickboldt commented Oct 7, 2019

metlos commented Oct 7, 2019

skabashnyuk commented Oct 8, 2019

sleshchenko commented Oct 8, 2019

metlos commented Oct 8, 2019

nickboldt commented Oct 9, 2019 •

edited

Loading

davidfestal commented Oct 9, 2019

metlos commented Oct 10, 2019

nickboldt commented Oct 11, 2019

Test Che6 to Che7 upgrade using che-operator #14200

Test Che6 to Che7 upgrade using che-operator #14200

Comments

skabashnyuk commented Aug 12, 2019

Is your task related to a problem? Please describe.

nickboldt commented Aug 12, 2019

metlos commented Sep 16, 2019

skabashnyuk commented Sep 17, 2019

metlos commented Sep 17, 2019

davidfestal commented Sep 17, 2019

metlos commented Sep 23, 2019 • edited Loading

metlos commented Sep 23, 2019

davidfestal commented Sep 24, 2019

davidfestal commented Sep 24, 2019 • edited Loading

metlos commented Sep 24, 2019

l0rd commented Oct 1, 2019

nickboldt commented Oct 1, 2019

davidfestal commented Oct 7, 2019

nickboldt commented Oct 7, 2019

metlos commented Oct 7, 2019

skabashnyuk commented Oct 8, 2019

sleshchenko commented Oct 8, 2019

metlos commented Oct 8, 2019

nickboldt commented Oct 9, 2019 • edited Loading

davidfestal commented Oct 9, 2019

metlos commented Oct 10, 2019

nickboldt commented Oct 11, 2019

metlos commented Sep 23, 2019 •

edited

Loading

davidfestal commented Sep 24, 2019 •

edited

Loading

nickboldt commented Oct 9, 2019 •

edited

Loading