Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sysinfo: Ignore "hidden" sysfs device entries #3260

Merged
merged 1 commit into from
Aug 4, 2023

Conversation

sagigrimberg
Copy link
Contributor

Some devices are "hidden" from userspace due to various reasons. One reason is native nvme multipathing, where the path devices do not present a device node to the user but only exposes the stacking device as the only nvme device. Still there are sysfs attributes for these hidden devices, but we should ignore them altogether because these are not "real" devices at all.

@k8s-ci-robot
Copy link
Collaborator

Hi @sagigrimberg. Thanks for your PR.

I'm waiting for a google member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@bobbypage
Copy link
Collaborator

Thanks for the PR. What is the issue this is solving currently? Are these devices somehow reported currently?

@sagigrimberg
Copy link
Contributor Author

Kubelet fails to access these devices, preventing csi metrics collection.

kubelet[4192]: E0102 10:01:46.083009    4192 info.go:99] Failed to get disk map: open /sys/block/nvme0c1n1/dev: no such file or directory

@bobbypage
Copy link
Collaborator

Got it. Is the /hidden a a common way that devices are reported hidden via sysfs? Is this documented somewhere?

@sagigrimberg
Copy link
Contributor Author

Yes, it is a common way. You can also refer to util-linux/lsblk.c that also does the same thing...

Unfortunately, we didn't add a proper documentation entry for it in sysfs/block ABI documentation
I can do that.

@sagigrimberg
Copy link
Contributor Author

ping?

@sagigrimberg
Copy link
Contributor Author

How can I trigger tests on this PR?

@iwankgb
Copy link
Collaborator

iwankgb commented Mar 14, 2023

/ok-to-test

@sagigrimberg
Copy link
Contributor Author

Can I get a review on this?

@sagigrimberg
Copy link
Contributor Author

ping?

@sagigrimberg
Copy link
Contributor Author

@iwankgb @bobbypage ?

utils/sysfs/sysfs.go Outdated Show resolved Hide resolved
@sagigrimberg
Copy link
Contributor Author

Any reason why this PR was not merged already? @bobbypage @iwankgb ?

@sagigrimberg
Copy link
Contributor Author

@bobbypage this is sitting here forever now, any reason why this is not taken?

@sagigrimberg
Copy link
Contributor Author

/retest

@sagigrimberg
Copy link
Contributor Author

retest

@sagigrimberg
Copy link
Contributor Author

@bobbypage can we get this in please? The failed test looks completely unrelated to the change?

This has been sitting here forever and I don't understand why this should be the case TBH. This PR has been primarily ignored while fixing a real issue in cadvisor.

@sagigrimberg sagigrimberg force-pushed the ignore-hidden-devices branch 2 times, most recently from ea1dac6 to 2393946 Compare June 26, 2023 12:09
@sagi-lb
Copy link

sagi-lb commented Jun 26, 2023

/retest

@iwankgb
Copy link
Collaborator

iwankgb commented Jun 26, 2023

@sagigrimberg, while I understand your sentiment regarding maintainers not reviewing code in timely manner, I have to say that majority of us are volunteers and we work on cAdvisor in our free time.
Take it into consideration when assigning blame for ignoring important changes.

@sagigrimberg
Copy link
Contributor Author

@iwankgb I was not assigning blame. I was just stating a fact. This PR was primarily ignored. This PR stands since March 2'nd, which since then cadvisor accumulated a few dosands of of PRs merged, while this one has gone, presumably ready to merge since about then (outside of breaking lately and me fixing it after a rebase).

If you don't have free time to work on the project, or you don't think that it is a suitable fix for the project, or if you think that there is risk associated it is fine, but it would be appreciated to get at least some feedback...

Glad this at least got some attention though...

// https://git.kernel.org/pub/scm/utils/util-linux/util-linux.git
// - c8487d854ba5 ("lsblk: Ignore hidden devices")
hidden, err := os.ReadFile(path.Join(blockDir, name, "/hidden"))
if err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-nil error that is not os.ErrNotExists must not be silently ignored. Returned wrapped error and handle it accordingly wherever this function is called, please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense @iwankgb, fixing this now.

return false, nil
}
if err != nil {
return false, fmt.Errorf("failed to read %s: %s", devHiddenPath, err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather wrap the error:

Suggested change
return false, fmt.Errorf("failed to read %s: %s", devHiddenPath, err)
return false, fmt.Errorf("failed to read %s: %w", devHiddenPath, err)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed, thanks!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some devices are "hidden" from userspace due to various
reasons. One reason is native nvme multipathing, where the
path devices do not present a device node to the user but
only exposes the stacking device as the only nvme device.
Still there are sysfs attributes for these hidden devices, but
we should ignore them altogether because these are not "real"
devices at all.

Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
@iwankgb iwankgb merged commit a9bf713 into google:master Aug 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants