Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Openstack] master foreach and fixes #8709

Merged

Conversation

robinAwallace
Copy link
Contributor

@robinAwallace robinAwallace commented Apr 13, 2022

What type of PR is this?
/kind feature

What this PR does / why we need it:

  1. Added possibility to toggle to use existing network with use_existing_network.
  2. Added possibility to set port_security to null with force_null_port_security. Some clouds do not allow to set this variable.
  3. Added depends_on for port when attaching floating ip. For some clouds the port needs a fixed_ip before attaching a floating_ip.
  4. Fixed the dynamic inventory to find the floating ips with the new networking resource.
  5. Added possibility to create master nodes using for_each. Easier to switch out a master.

Which issue(s) this PR fixes:

Fixes #8696

Special notes for your reviewer:

It started of with just adding for_each for master nodes but ended yup fixing some other issues.

Does this PR introduce a user-facing change?:

[OpenStack] Create master nodes with `for_each`  for openstack. Makes it easier to switch out master nodes via terraform.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 13, 2022
@k8s-ci-robot
Copy link
Contributor

Hi @robinAwallace. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 13, 2022
@robinAwallace robinAwallace force-pushed the rw/terraform-master-foreach branch 2 times, most recently from f8e03c0 to 167c26e Compare April 13, 2022 06:45
@cristicalin
Copy link
Contributor

Nice work @robinAwallace !

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 13, 2022
@cristicalin
Copy link
Contributor

@robinAwallace I see the elastix CI job is still failing: https://gitlab.com/kargo-ci/kubernetes-sigs-kubespray/-/jobs/2329389057 ; am I wrong in assuming that this change should have addressed that openstack failure as well ?

@robinAwallace
Copy link
Contributor Author

@cristicalin Yes it should fix it. By setting use_existing_network=false will ensure that a new network is created.

@@ -68,6 +68,14 @@ variable "network_id" {
default = ""
}

variable "use_existing_network" {
type = bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this should have a default to false.

@cristicalin
Copy link
Contributor

@cristicalin Yes it should fix it. By setting use_existing_network=false will ensure that a new network is created.

In this case I think this should be the default, no? I'm not sure how terraform initialises boolean variables without a default, maybe that explains the CI failure.

variable "use_existing_network" {
description = "Use a existing network"
type = bool
default = "true"
Copy link
Member

@floryut floryut Apr 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cristicalin default to true here (regarding your question about use_existing_network, as terraform don't init var, but fail if no defaults)

@robinAwallace
Copy link
Contributor Author

I can switch it so default false. If that will solve the pipeline issue.

@floryut
Copy link
Member

floryut commented Apr 13, 2022

I can switch it so default false. If that will solve the pipeline issue.

I'm not sure, default to true seems fine, but setting it to false for the elastx job maybe 😄
https://github.com/kubernetes-sigs/kubespray/blob/master/.gitlab-ci/terraform.yml#L161

@robinAwallace
Copy link
Contributor Author

I can switch it so default false. If that will solve the pipeline issue.

I'm not sure, default to true seems fine, but setting it to false for the elastx job maybe smile https://github.com/kubernetes-sigs/kubespray/blob/master/.gitlab-ci/terraform.yml#L161

Okay, will update it 🙂

@robinAwallace robinAwallace force-pushed the rw/terraform-master-foreach branch from 61cc623 to ddff56b Compare April 13, 2022 13:41
@cristicalin
Copy link
Contributor

I can switch it so default false. If that will solve the pipeline issue.

I'm not sure, default to true seems fine, but setting it to false for the elastx job maybe smile https://github.com/kubernetes-sigs/kubespray/blob/master/.gitlab-ci/terraform.yml#L161

In public clouds I think the safest approach is false as they don't give you access directly to their provider networks. In private clouds it depends some may go with the simple approach of one provider but that is not the norm.

@robinAwallace
Copy link
Contributor Author

I can switch it so default false. If that will solve the pipeline issue.

I'm not sure, default to true seems fine, but setting it to false for the elastx job maybe smile https://github.com/kubernetes-sigs/kubespray/blob/master/.gitlab-ci/terraform.yml#L161

In public clouds I think the safest approach is false as they don't give you access directly to their provider networks. In private clouds it depends some may go with the simple approach of one provider but that is not the norm.

@cristicalin Im open to have it false as default. It does not really matter for us.

@robinAwallace robinAwallace force-pushed the rw/terraform-master-foreach branch from ddff56b to e25f438 Compare April 14, 2022 09:44
@robinAwallace
Copy link
Contributor Author

@cristicalin @floryut use_existing_network is now default false.

@robinAwallace robinAwallace force-pushed the rw/terraform-master-foreach branch from e25f438 to 4af318e Compare April 28, 2022 05:28
@cristicalin
Copy link
Contributor

Thanks for fixing this @robinAwallace !

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cristicalin, robinAwallace

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 28, 2022
@cristicalin
Copy link
Contributor

/cc @floryut @oomichi

@k8s-ci-robot k8s-ci-robot requested a review from oomichi May 2, 2022 04:30
@oomichi
Copy link
Contributor

oomichi commented May 3, 2022

/lgtm

1 similar comment
@oomichi
Copy link
Contributor

oomichi commented May 3, 2022

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 3, 2022
@k8s-ci-robot k8s-ci-robot merged commit fe66121 into kubernetes-sigs:master May 3, 2022
@oomichi oomichi mentioned this pull request May 28, 2022
@rptaylor
Copy link
Contributor

rptaylor commented Jun 9, 2023

With a default of false it seems this has caused a change in behaviour. Previously you could just set use_neutron=0 and network_name to use a pre-existing network. Now this results in an error, so you also have to set use_existing_network = true.
It's fine but a release note would have been helpful to prepare for the change and avoid errors.

@rptaylor
Copy link
Contributor

rptaylor commented Jun 9, 2023

Also doesn't use_existing_network have effectively the same function as use_neutron already had?
@robinAwallace
Unfortunately use_neutron was added without documentation :( also it is weird that it takes 0 or 1 instead of true/false.
If use_neutron is redundant now perhaps it could be removed?

@robinAwallace
Copy link
Contributor Author

Hi @rptaylor, that was unfortunate that neutron and network name was missed. As you said it was not documented 😞
But yes it looks like neutron is redundant now.
Hope you manged to use the new way of using an existing network 🙂

@rptaylor
Copy link
Contributor

rptaylor commented Jun 12, 2023

This seems to change the mapping of ports to nodes, which causes Terraform to destroy and rebuild all nodes of an existing cluster after upgrading to Kubespray 2.19. This is very disruptive. I won't be able to manage my clusters with Terraform anymore (e.g. scale up/down nodes, modify security groups etc.) unless I destroy and rebuild them all from scratch.
Is there any workaround possible for existing users to keep using kubespray after upgrading to 2.19 without destroying and rebuilding all clusters @cristicalin @floryut @robinAwallace ?

If I do terraform plan it shows a port change forces replacement of each node, including masters:

  # module.compute.openstack_compute_instance_v2.k8s_master_no_floating_ip[0] must be replaced
-/+ resource "openstack_compute_instance_v2" "k8s_master_no_floating_ip" {
      ~ access_ip_v4        = "192.168.243.200" -> (known after apply)
      + access_ip_v6        = (known after apply)
      ~ all_metadata        = {
          - "depends_on"       = " "
          - "kubespray_groups" = "etcd,kube_control_plane,,k8s_cluster,no_floating"
          - "ssh_user"         = "almalinux"
          - "use_access_ip"    = "1"
        } -> (known after apply)
      ~ all_tags            = [] -> (known after apply)
      ~ flavor_name         = "p2-3gb" -> (known after apply)
      ~ id                  = "83bf862c-51cb-4a3c-982b-8616d1926e48" -> (known after apply)
      ~ image_id            = "Attempt to boot from volume - no image supplied" -> (known after apply)
      + image_name          = (known after apply)
        name                = "cluster-dev-k8s-master-nf-1"
      ~ region              = "RegionOne" -> (known after apply)
      ~ security_groups     = [
          - "cluster-dev-k8s",
          - "cluster-dev-k8s-master",
          - "cluster-dev-k8s-master-rules",
        ] -> (known after apply)
      - tags                = [] -> null
        # (7 unchanged attributes hidden)

      ~ block_device {
            # (7 unchanged attributes hidden)
        }

      ~ network {
          ~ fixed_ip_v4    = "192.168.243.200" -> (known after apply)
          + fixed_ip_v6    = (known after apply)
          + floating_ip    = (known after apply)
          ~ mac            = "fa:16:3e:e9:8d:18" -> (known after apply)
          ~ name           = "VLAN300" -> (known after apply)
          + port           = (known after apply) # forces replacement
          ~ uuid           = "4e8d0dcf-4853-454c-9184-d2379f397a9b" -> (known after apply)
            # (1 unchanged attribute hidden)
        }
    }

As for use_neutron, it looks like that is used to determine whether to create a router and subnet etc (contrib/terraform/openstack/modules/network/main.tf) whereas use_existing_network mainly handles setting up the ports for the nodes, so I guess both are still needed - or the new use_existing_network variable could be expanded to replace the remaining functionality of use_neutron.

@robinAwallace
Copy link
Contributor Author

robinAwallace commented Jun 13, 2023

Right, this version was really distributive for openstack. We really maintain our own wrapper around kubespray, with some migration scripts. You can find how we migrated our openstack terraform state to the new version here

Note that our migration script expects two clusters. But you could extract the bash script to work for one cluster.

LuckySB pushed a commit to southbridgeio/kubespray that referenced this pull request Jun 30, 2023
* [openstack] fix for new network modules

* [openstack] for-each master nodes
LuckySB pushed a commit to southbridgeio/kubespray that referenced this pull request Oct 23, 2023
* [openstack] fix for new network modules

* [openstack] for-each master nodes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Terraform error when creating cluster internal network in OpenStack
7 participants