Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect rbd_clone4 support by using dlsym() #1013

Merged
merged 3 commits into from
Jul 25, 2024

Conversation

nixpanic
Copy link
Member

The dlsym package provides LookupSymbol() which resolves a named symbol
(like "rbd_clone4") and returns a unsafe.Pointer to it. The caller is
expected to cast the symbol to function that can be called.

Some versions of librbd provide the rbd_clone4 function, and others do
not. Squid will have the function backported in the 1st update, the
initial release of Squid does not have it. This makes checking for the
function based on the named Ceph version impractical.

With the new dlsym.LookupSymbol() function, it is now possible to check
the availability of rbd_clone4 during runtime. If the symbol is not
found ErrNotImplemented is returned, which can be used to detect the
unavailability of the function.

Checklist

  • Added tests for features and functional changes
  • Public functions and types are documented
  • Standard formatting is applied to Go code
  • Is this a new API? Added a new file that begins with //go:build ceph_preview
  • Ran make api-update to record new APIs

New or infrequent contributors may want to review the go-ceph Developer's Guide including the section on how we track API Status and the API Stability Plan.

The go-ceph project uses mergify. View the mergify command guide for information on how to interact with mergify. Add a comment with @Mergifyio rebase to rebase your PR when github indicates that the PR is out of date with the base branch.

@nixpanic
Copy link
Member Author

pre-squid runs the test cases with success:
image

quincy skips the test cases as expected:
image

@nixpanic nixpanic requested a review from anoopcs9 July 12, 2024 17:23
@phlogistonjohn
Copy link
Collaborator

Very interesting. I will try to dig into it in more detail soon!

also cc: @ansiwen

Copy link
Collaborator

@anoopcs9 anoopcs9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@anoopcs9 anoopcs9 requested a review from ansiwen July 16, 2024 09:07
@anoopcs9 anoopcs9 added the API This PR includes a change to the public API of a go-ceph package label Jul 16, 2024
@ansiwen
Copy link
Collaborator

ansiwen commented Jul 16, 2024

No specific concerns, technically well done.

My only concern is, that I think this is a case of over-engineering to add the libdl dependency in general for such an edge-case. I'm not really sure about the side effects on non-Linux and MUSL systems. But I will not veto it, if a majority prefers this change. I just wanted to note here that I would prefer to just not support the first release of Squid.

@nixpanic
Copy link
Member Author

@ansiwen, one of the issues is that the Ceph team considers other new librdb APIs (features) as backports for minor Ceph releases. The rbd_clone4 function is just the 1st where this happens, #1010 is related to a 2nd candidate for backporting to Squid.

I am not an enormous fan of the proposed solution either, but it is simpler for users of go-ceph than handling build-tags for each API that gets backported.

@phlogistonjohn
Copy link
Collaborator

My only concern is, that I think this is a case of over-engineering to add the libdl dependency in general for such an edge-case. I'm not really sure about the side effects on non-Linux and MUSL systems.

That's a fair point, would a tag like nodlsym that disables this file be a viable addition for those kinds of platforms? Then if we see people file issues related to dlsym we can say 'try with build tag "nodlsym"' but default it to on for the more common users of Linux+glibc?

@anoopcs9
Copy link
Collaborator

The rbd_clone4 function is just the 1st where this happens, #1010 is related to a 2nd candidate for backporting to Squid.

Looking at the nature of APIs suggested via #1010 I was wondering whether "dlsym" approach can help us with all those variables/enums coming along with it.

@nixpanic
Copy link
Member Author

The rbd_clone4 function is just the 1st where this happens, #1010 is related to a 2nd candidate for backporting to Squid.

Looking at the nature of APIs suggested via #1010 I was wondering whether "dlsym" approach can help us with all those variables/enums coming along with it.

It works for new functions and structs, not enums or functions that were changed. Modifying existing public functions is not something that should be done in any case, it will break existing applications that use them. Introduction of new functions while keeping (possibly deprecating) the old ones is a more sane approach.

@nixpanic
Copy link
Member Author

@ansiwen and @phlogistonjohn I've dropped the CGo linker directive to link against libdl. It seems the CGo linker is quite forgiving, and it works without explicitly mentioning -ldl.

Do you have any other concerns that I should address?

@nixpanic
Copy link
Member Author

@ansiwen and @phlogistonjohn I've dropped the CGo linker directive to link against libdl. It seems the CGo linker is quite forgiving, and it works without explicitly mentioning -ldl.

Do you have any other concerns that I should address?

Oh, interesting! Local testing with the make test-container CEPH_VERSION=... worked. Not sure whats different in these CI jobs, but obviously the CGo linker is more strict there 🤔

@nixpanic nixpanic force-pushed the dlsym/rbd_clone4 branch 2 times, most recently from 8faeec4 to 91e143f Compare July 23, 2024 08:48
// clear dlerror before looking up the symbol
C.dlerror()
// resolve the address of the symbol
sym := C.dlsym(nil /* C.RTLD_DEFAULT */, cSymName)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On MacOS this must be set to RTLD_DEFAULT, NULL doesn't work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately C.RTLD_DEFAULT could not be found for some reason. Maybe a header or define is missing, I'll try to figure that out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a define for RTLD_DEFAULT, it should be in the header, but migth be missing in some Ceph container(s)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird. Would be good to know why. How can dlsym be defined, but RTLD_DEFAULT not? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's weird. Would be good to know why. How can dlsym be defined, but RTLD_DEFAULT not? 🤔

Yeah, I was not able to find out. Each dlfcn.h that I checked had RTLD_DEFAULT without #ifdef ... or something around it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1026 might be the key here.

internal/dlsym/dlsym.go Show resolved Hide resolved
rbd/clone_image_by_id.go Outdated Show resolved Hide resolved
The dlsym package provides LookupSymbol() which resolves a named symbol
(like "rbd_clone4") and returns a unsafe.Pointer to it. The caller is
expected to cast the symbol to function that can be called.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
When there is a failure during the CloneFromGroupSnap test case, the
rbd-group snapshot was not removed, preventing images from being deleted
as well.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
Some versions of librbd provide the rbd_clone4 function, and others do
not. Squid will have the function backported in the 1st update, the
initial release of Squid does not have it. This makes checking for the
function based on the named Ceph version impractical.

With the new dlsym.LookupSymbol() function, it is now possible to check
the availability of rbd_clone4 during runtime. If the symbol is not
found ErrNotImplemented is returned, which can be used to detect the
unavailability of the function.

Signed-off-by: Niels de Vos <ndevos@ibm.com>
@anoopcs9
Copy link
Collaborator

The rbd_clone4 function is just the 1st where this happens, #1010 is related to a 2nd candidate for backporting to Squid.

Looking at the nature of APIs suggested via #1010 I was wondering whether "dlsym" approach can help us with all those variables/enums coming along with it.

It works for new functions and structs, not enums or functions that were changed. Modifying existing public functions is not something that should be done in any case, it will break existing applications that use them. Introduction of new functions while keeping (possibly deprecating) the old ones is a more sane approach.

Shouldn't we document the situation with enums/variables from original Ceph API referenced in the corresponding go-ceph API?

Nevertheless I think its good to have this new approach documented within development.md.

Copy link
Collaborator

@ansiwen ansiwen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, I wondered if we could simplify the whole construct, and make it more reusable, but we can do that, when the second case comes in.

I set do-not-merge to give @nixpanic a chance to react to my further comments. Just remove it when you are ready to merge.

@nixpanic
Copy link
Member Author

nixpanic commented Jul 24, 2024

Shouldn't we document the situation with enums/variables from original Ceph API referenced in the corresponding go-ceph API?

Nevertheless I think its good to have this new approach documented within development.md.

@anoopcs9 , I think enums and other #defines work are well understood by developers that are familiar with C. They are not symbols in a shared library, so dlsym() can not detect them either. This PR does not use enums or new librbd #defines, so I think documenting a potential workaround for those is out of scope here?

@nixpanic nixpanic closed this Jul 24, 2024
@nixpanic nixpanic reopened this Jul 24, 2024
@anoopcs9
Copy link
Collaborator

anoopcs9 commented Jul 25, 2024

Shouldn't we document the situation with enums/variables from original Ceph API referenced in the corresponding go-ceph API?

Nevertheless I think its good to have this new approach documented within development.md.

@anoopcs9 , I think enums and other #defines work are well understood by developers that are familiar with C. They are not symbols in a shared library, so dlsym() can not detect them either. This PR does not use enums or new librbd #defines, so I think documenting a potential workaround for those is out of scope here?

Even otherwise I was more interested in documenting the LookupSymbol() helper from dlysm package as a hint to someone who could end up in such a situation where an API is only available with further minor updates. I won't insist though as others haven't shown interest in such a documentation effort yet.

I'm removing the do-not-merge label.

Copy link
Collaborator

@anoopcs9 anoopcs9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@mergify mergify bot merged commit a9ce294 into ceph:master Jul 25, 2024
27 of 28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API This PR includes a change to the public API of a go-ceph package
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants