Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use csiNode informer to prevent hitting the API server for all time #327

Conversation

mucahitkurt
Copy link
Contributor

What type of PR is this?
/kind cleanup

What this PR does / why we need it:
Use csiNode shared index informer instead of hitting the API server for all csiNode resource request.

Which issue(s) this PR fixes:
Fixes #144

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

cc @msau42

@k8s-ci-robot
Copy link
Contributor

Welcome @mucahitkurt!

It looks like this is your first PR to kubernetes-csi/external-provisioner 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-csi/external-provisioner has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 12, 2019
@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 12, 2019
@k8s-ci-robot
Copy link
Contributor

Hi @mucahitkurt. Thanks for your PR.

I'm waiting for a kubernetes-csi or kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Aug 12, 2019
@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 727cb01 to ae4316b Compare August 12, 2019 15:46
@msau42
Copy link
Collaborator

msau42 commented Aug 12, 2019

/ok-to-test
/assign @verult

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 12, 2019
@msau42
Copy link
Collaborator

msau42 commented Aug 14, 2019

/retest

Copy link
Contributor

@verult verult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

Was the Gopkg.lock change intentional? There isn't any Gopkg.toml changes.

Did you get a chance to test the provisioner end to end manually?

// TODO (#144): use informers
selectedNodeInfo, err := kubeClient.StorageV1beta1().CSINodes().Get(selectedNode.Name, metav1.GetOptions{})
if err != nil {
key := fmt.Sprintf("%s/%s", selectedNode.Namespace, selectedNode.Name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node objects are non-namespaced. Is this the right key format?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not right, it should be only name


rand.Shuffle(len(nodeInfos.Items), func(i, j int) {
nodeInfos.Items[i], nodeInfos.Items[j] = nodeInfos.Items[j], nodeInfos.Items[i]
nodeInfos := informer.GetIndexer().List()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: rename all occurrences of nodeInfo to csiNode

I know you didn't make the name but it'd be super helpful if you could change it while you are at it, but totally optional

preBetaNodeVersion = "1.13.0"
testDriverName = "com.example.csi/test-driver"
preBetaNodeVersion = "1.13.0"
interval = 100 * time.Millisecond
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a more descriptive name - what interval is it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll delete this after HasSynced method usage

Labels: l,
Name: fmt.Sprintf("node-%d", i),
Labels: l,
Namespace: "testnamespace",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node objects shouldn't have namespaces, so this isn't realistic

@@ -1528,3 +1550,20 @@ func requisiteEqual(t1, t2 []*csi.Topology) bool {

return unchecked.Len() == 0
}

func runCsiNodeInformer(kubeClient *fakeclientset.Clientset, t *testing.T, csiNodeNum int) (cache.SharedIndexInformer, chan struct{}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: csiNodeNum -> csiNodeCount . "Num" could be misinterpreted as index, for example

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll delete it after HasSynced method usage

@@ -179,9 +181,27 @@ func main() {
provisionerOptions = append(provisionerOptions, controller.AdditionalProvisionerNames([]string{supportsMigrationFromInTreePluginName}))
}

// Create informer to prevent hit the API server for all resource request
csiNodeInformer := v1beta1.NewCSINodeInformer(clientset, 1*time.Hour, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull 1 * time.Hour into a named const


func waitToObserveCsiNodes(t *testing.T, csiNodeInformer cache.SharedIndexInformer, csiNodeNum int) {
if err := wait.PollImmediate(interval, informerWaitTimeout, func() (bool, error) {
objects := csiNodeInformer.GetIndexer().List()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this just make sure the informer is synced? Could you use informer.HasSynced() instead?

@@ -252,6 +255,7 @@ func NewCSIProvisioner(client kubernetes.Interface,
controllerCapabilities: controllerCapabilities,
supportsMigrationFromInTreePluginName: supportsMigrationFromInTreePluginName,
strictTopology: strictTopology,
informer: informer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to wait for the informer to sync.

IIRC other CSI controllers have examples of using informers and they should have sync wait logic

@@ -179,9 +181,27 @@ func main() {
provisionerOptions = append(provisionerOptions, controller.AdditionalProvisionerNames([]string{supportsMigrationFromInTreePluginName}))
}

// Create informer to prevent hit the API server for all resource request
csiNodeInformer := v1beta1.NewCSINodeInformer(clientset, 1*time.Hour, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc})
go csiNodeInformer.Run(context.TODO().Done())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO context.Background() is more appropriate

@@ -179,9 +181,27 @@ func main() {
provisionerOptions = append(provisionerOptions, controller.AdditionalProvisionerNames([]string{supportsMigrationFromInTreePluginName}))
}

// Create informer to prevent hit the API server for all resource request
csiNodeInformer := v1beta1.NewCSINodeInformer(clientset, 1*time.Hour, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change 1*time.Hour into a named const. Ditto for all the other time constants

@mucahitkurt
Copy link
Contributor Author

mucahitkurt commented Aug 15, 2019

Thanks for the PR!

Was the Gopkg.lock change intentional? There isn't any Gopkg.toml changes.

Did you get a chance to test the provisioner end to end manually?

Thanks for the comments, I'll try to fix them.

Yes, Gopkg.lock change is intentional, Gopkg.toml is not changed because I did not add a new library dependency, I just added some package dependency under client-go.

No, I didn't make e2e manual test, TBH I'm not very familiar with the all environment of external provisioner but I'll try to run e2e test for the external provisioner.

@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from ae4316b to 0e58fc7 Compare August 15, 2019 22:26
@msau42
Copy link
Collaborator

msau42 commented Aug 15, 2019

It might be worth implementing topology first in the hostpath csi driver: kubernetes-csi/csi-driver-host-path#54. Then, we have automated pull/ci jobs that can test the topology path. (We'll also need to update those pull jobs to bring up a multi-node cluster)

I am also working on adding additional topology tests as part of kubernetes/kubernetes#71289

@mucahitkurt
Copy link
Contributor Author

/test pull-kubernetes-csi-external-provisioner-1-13-on-kubernetes-1-13

@mucahitkurt
Copy link
Contributor Author

/retest

@mucahitkurt
Copy link
Contributor Author

@verult Do you have any idea why pull-kubernetes-csi-external-provisioner-1-13-on-kubernetes-1-13 tests are failed? Thanks!

@msau42
Copy link
Collaborator

msau42 commented Aug 19, 2019

E0816 19:40:18.164363       1 reflector.go:125] github.com/kubernetes-csi/external-provisioner/cmd/csi-provisioner/csi-provisioner.go:186: Failed to list *v1beta1.CSINode: the server could not find the requested resource

Topology was an alpha feature in 1.13, so the beta object doesn't exist. To avoid this, we should only create the informer if the topology feature gate in the external-provisioner is enabled.

@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 0e58fc7 to 3614485 Compare August 19, 2019 19:16
@msau42
Copy link
Collaborator

msau42 commented Aug 20, 2019

/test pull-kubernetes-csi-external-provisioner-1-15-on-kubernetes-1-15

var csiNodeInformer cache.SharedIndexInformer
if utilfeature.DefaultFeatureGate.Enabled(features.Topology) {
// Create informer to prevent hit the API server for all resource request
csiNodeInformer := v1beta1.NewCSINodeInformer(clientset, ctrl.ResyncPeriodOfCsiNodeInformer, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure indexing by namespace makes sense because CSINode is a non-namespaced object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to cache.Indexers{}

@@ -235,7 +239,8 @@ func NewCSIProvisioner(client kubernetes.Interface,
pluginCapabilities connection.PluginCapabilitySet,
controllerCapabilities connection.ControllerCapabilitySet,
supportsMigrationFromInTreePluginName string,
strictTopology bool) controller.Provisioner {
strictTopology bool,
informer cache.SharedIndexInformer) controller.Provisioner {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of passing in the entire informer to the provisioner, how about only passing in the Lister(), since all we need is Get() and List()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change to informer.GetStore()

@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 3614485 to 3ecf51a Compare August 20, 2019 22:02
@mucahitkurt
Copy link
Contributor Author

/retest

1 similar comment
@mucahitkurt
Copy link
Contributor Author

/retest

@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 7bdb6d2 to b961349 Compare October 1, 2019 19:02
var factory informers.SharedInformerFactory
if utilfeature.DefaultFeatureGate.Enabled(features.Topology) {
// Create informer to prevent hit the API server for all resource request
factory = informers.NewSharedInformerFactory(clientset, ctrl.ResyncPeriodOfCsiNodeInformer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the lister is only necessary when topology feature is used, should it use supportsTopology() instead to capture all conditions?

func (p *csiProvisioner) supportsTopology() bool {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean to change the condition utilfeature.DefaultFeatureGate.Enabled(features.Topology) with pluginCapabilities[csi.PluginCapability_Service_VOLUME_ACCESSIBILITY_CONSTRAINTS] && utilfeature.DefaultFeatureGate.Enabled(features.Topology) or do you mean to use csiNodeLister as initiated inside the csiProvisioner with the supportsTopology() check instead of a external provided property of the struct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added additional check for plugin capabilities.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup this is what I meant, though in its current form, I'm worried that someone later on will only change one place but not the other. It would be good to have a shared utility function instead in a different file (maybe package/util)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks. Putting the function inside topology.go instead of util package makes sense to me.

Copy link
Contributor

@verult verult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay, just a few more last comments and everything else looks good!

// error with the API server.
return nil, fmt.Errorf("error listing CSINodes: %v", err)
klog.Warningf("No csi nodes found")
return nil, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come it's OK to not return an error now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't remember any valid case for this change, I think I changed it in a wrong way, I'll change it to return error.

@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from b961349 to 704a706 Compare October 9, 2019 15:44
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2019
@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 704a706 to 71473f7 Compare October 9, 2019 15:54
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 9, 2019
@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 71473f7 to 70c2877 Compare October 10, 2019 15:36
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2019
@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 70c2877 to 6061117 Compare October 10, 2019 15:46
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 10, 2019
@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 6061117 to 14e1e38 Compare October 22, 2019 19:01
@mucahitkurt
Copy link
Contributor Author

@verult PTAL

@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from 14e1e38 to d5b98b8 Compare October 22, 2019 20:39
@verult
Copy link
Contributor

verult commented Oct 24, 2019

/lgtm

Thanks @mucahitkurt !

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 24, 2019
@@ -168,10 +168,6 @@ k8s.io/api/apps/v1
k8s.io/api/apps/v1beta1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is updating this file necessary? it looks like just some lines got moved around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes content is the same, just some order changes, but I'm getting error from CI like below when I don't change this file with the below command;

ERROR: vendor directory *not* up-to-date, it did get modified by 'GO111MODULE=on go mod vendor':

https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/pull/kubernetes-csi_external-provisioner/327/pull-kubernetes-csi-external-provisioner-unit/1187927085243437058

https://travis-ci.org/kubernetes-csi/external-provisioner/builds/603113049?utm_source=github_status&utm_medium=notification

@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from d5b98b8 to b1bccd2 Compare October 26, 2019 03:00
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2019
…source request

Signed-off-by: Mucahit Kurt <mucahitkurt@gmail.com>
@mucahitkurt mucahitkurt force-pushed the use-node-and-csinodeinfo-informers branch from b1bccd2 to 17fa2a8 Compare October 26, 2019 03:30
@msau42
Copy link
Collaborator

msau42 commented Oct 26, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 26, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: msau42, mucahitkurt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 26, 2019
@k8s-ci-robot k8s-ci-robot merged commit 4160887 into kubernetes-csi:master Oct 26, 2019
@mucahitkurt mucahitkurt deleted the use-node-and-csinodeinfo-informers branch October 27, 2019 08:07
kbsonlong pushed a commit to kbsonlong/external-provisioner that referenced this pull request Dec 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consider using Node & CSINodeInfo informers
4 participants