-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a more conservative approach for SOVERSION #1505
Comments
Providing any shared library compatibility at all means that when a user does a non-ABI-breaking update to their htslib installation, any previously installed (older) On ELF platforms (at least), the solution to the problem you describe is indeed related to symbol versioning. (Or in a distro context, the other solution is package dependencies in addition to shared library file dependencies. See below.) In the past, HTSlib has declined to take on this maintenance burden. Debian do some symbol version tracking in their packaging (see debian/libhts3.symbols) but I don't believe this results in any actual symbol versioning being applied in their libhts.so.* as packaged. Oddly enough, they have never bothered contributing any of this upstream.
That can be considered be a bug in Fedora's packaging. You should file a bug with them suggesting that they add an appropriate |
Agreed with John above. It's probably fair to say however that the samtools and bcftools configure scripts ought to check specific versions better, and we should document the minimum htslib version needed in the INSTALL file too which lists other dependencies. This would make it easier for distributions to specify the dependencies correctly. |
Users will even get a better experience when they also update samtools at the same time (which implies a rebuild). I can't see a scenario where updating htslib is possible, but rebuilding samtools and bcftools puts an unreasonable burden. Also, shared libraries with different soversions are co-installable. An older samtools could still use libhts.so.3, while a hypothetical libhts.so.4 is available at the same time.
Debians approach is totally distro specific. Although it bares some similarities with proper symbol versioning, it is different. It also only happens on the package manager level, the linker is totally unaware of it.
This would be a very cumbersome, and very fragile approach.
As you can hopefully see, the current scheme has several drawbacks, and is quite fragile. Every step to make it work better is currently distro specific and prone to fail due to human error. The backwards compatibility benefit IMHO is mostly a theoretical one. Encouraging people to also update bcftools and samtools has its own merits. Third party tools can either be rebuild, or keep using a coinstalled older htslib version. |
You seem to largely know what you're talking about. Rest assured that other people do too. To be sure, “appropriate” was doing a lot of heavy lifting in my sentence. As James noted, upstream samtools & bcftools could do more to make their particular HTSlib requirements more clear, and thus take up the maintenance burden of making that information available to packagers who may wish to know it. Your point (3)'s use of libhts3/libhts4 package naming betrays a Debian (or similar) mindset. If you're going to use a Fedora example, you should probably expect people to reply to it using Fedora package naming conventions. |
Incidentally this is what Debian, Bioconda, and NetBSD do. |
What situation have you come across where this is causing a problem? Normally you would either build samtools and htslib together, or install them together, so the versions should match. Note that having too many versions of the library around can cause other problems. It's not unknown for third-party packages to link HTSlib more than once via multiple dependencies. If these end up linking different copies of the library then the results are likely to be bad. Also, some package managers (notably Conda) have trouble dealing with multiple library versions. Having packages depend on various different HTSlibs would make it very difficult to install some of them together where that is the case. |
I'm not disagreeing that full symbol versioning wouldn't be a nice thing, but it's alsot quite a bit of work to maintain and there are risks there with accidents in mislabelling still causing problems, so it needs considerable infrastructure and testing/validation frameworks too. I'm not inclined to do this unless there is a specific demonstration of it being necessary, rather than just desireable.
I don't understand this point. It's perfectly valid for libhts3 to be provided by both package 1.16 and 1.20, and infact is generally what happens. If a tool has a dependency on package version 1.16 and something else has a dependency on 1.20 causing it to be updated, then the tool with a dependency on 1.16 will carry on working just fine, without needing to install both at the same time. That's the definition of backwards compatible. I think the confusion here is how dependencies are written. My understanding is they list package versions rather than library versions, in which case it should all work fine (provided the dependency information is correctly written down, which Conda has shown is most definitely not a given). |
The confusion here is due to various comments mixing up the package naming conventions and parallel installation conventions of different distributions. Debian names its shared library runtime packages like Fedora names its shared library runtime packages like |
Thanks for the clarification, but I still don't really understand how it changes anything. The package names in Debian may be libhts3 and libhts4, but the packages still have version numbers. Eg I currently have libcurl4 (version 7.58.0-2ubuntu3.20) and libcurl3-nss (7.58.0-2ubuntu3.20) both installed at the same time. Packages depend on the library so version for their symbols, but also often specify the minimum version. Eg samtools here depends on libhts3 version >= 1.10. I can see absolutely no reason why Fedora cannot do the same, nor any other system. I don't see this as fragile either. The definition of backwards compatibility is that if package A depends on libhts3 >= 1.10 and package B depends on libhts3 >= 1.16 then 1.16 will be installed and we have knowledge it'll still work for package A. Now if we lose the "3" and use a flat namespace for all libraries and we don't know that, say, libhts 1.20 provides a libhts.so.4 library as it had an ABI change, then we have to start being more precise, eg libhts >= 1.16 and < 1.20. It still works fine, although it's a bit more labour intensive (but that's the price you pay for a single flat naming scheme and not separating by ABI). Further more, it's obviously less problematic than having pretty much every release having an so version bump, as that would just lead to a myriad of package versions and also make it far more likely that someone ends up using a buggy library for particular tools as people would just hard-code the version that worked when they released their tool. (If not, you're back to the same deal of tracking >= vers < vers, so it bought us nothing anyway). The right solution is indeed symbol versioning meta-data, but it's quite onerous to keep up to date and from what I can see it only tends to be large packages with huge teams supporting them that do this. I also don't see how it helps if your packaging naming isn't distinguishing between major SO version numbers anyway, as package maintainers would still have to manually track a list of which htslib release provides which SO version. It fixes run time complaints, but doesn't do anything to solve package dependencies I think. |
If the libraries being built against contain versioned symbols, But in general, I agree with everything you say — I haven't seen many libraries other than glibc itself actually doing this. |
Maybe you did not look close enough? Querying a sizable subset of the distro, there surely must be a name you recognize in https://paste.opensuse.org/16294206 .
Right, that's just for verification. In a way, that file is the left-hand argument for a tool like abidiff (even though abidiff probably is not the one implementation that is getting used in Debian building). |
Maybe I haven't looked recently. I looked at a few of those you listed that appear to be efforts by small development teams, similar to HTSlib's position. Libjansson just uses Readline appears on your list, but does not have versioned symbols on Alma Linux and does not appear to maintain a map file upstream. So I assume this is added in your distribution's packaging. I see that libacl, ncurses, xz, and zlib do in fact maintain their own exports, ncurses.map, liblzma*.map, and zlib.map files upstream. I am somewhat surprised, as I expected that in more cases this would be something added by distributions. On Alma Linux at least, some of these version scripts are not used: e.g. ncurses's symbols are not versioned on this distribution. I note that GSL, GMP, MPFR, MPC, and OpenBLAS are fairly major numerics libraries that do not appear on your list and do not appear to version their symbols (at least on Alma Linux). Perhaps those advocating for HTSlib symbol versioning would like to propose a draft PR showing what would be required. Or perhaps they could provide some answers to help overcome the maintainers' reluctance to take on this maintenance burden. I'm no longer one of the core maintainers, but for me the initial questions would be:
(Rest assured that the maintainers will be familiar with Drepper's DSO Howto.) |
If we or someone else is to do the leg work for this, we need to know what problem we're actually trying to solve. Specifically:
If distributions do not track individual symbols and only have a package named after SONAME (Debian) or worse after the overall package name (RedHat?), then how does recording which symbols appeared in which release solve things for the distribution makers? I can't see that it changes anything.
My understanding of this is that adding symbol versioning will "solve" this by causing the application to complain about an incompatible library at the run-time link phase; so when we launch the application rather than when the function is first used. Is that correct? If so we haven't helped the user in any meaningful way. A failure is still a failure. Note I'm not saying that we shouldn't improve things. I think we've been a bit slack in reporting which htslib versions are compatible with which samtools and bcftools packages. Stating this very clearly (and optionally also enforcing it during builds via configure checks) would greatly simplify the job of producing correct packages for the distribution vendors. For now we've sort of ran an implicit assumption of samtools 1.X needs to compile against htslib >= 1.X, and as a run-time binary it needs the correct SONAME htslib with version >= 1.X. Now that's not entirely true as sometimes we can get away with e.g. samtools 1.9 using a copy of htslib 1.8 (I haven't checked - this is purely an example) as it's not using any newer functions, but for safety and simplicity it's easier to assume equal or newer version is OK. |
I think the answer to both those questions — from the distribution point of view — is that it enables the complaint and failure to occur at package upgrade time. Which is well before the program/application is run/launched. As described in #1505 (comment), if libhts.so.3 is built with symbol versioning, Then when a user has htslib-1.12 and samtools-1.12 installed, and types (In practice of course, the user is probably using Whether this is less or more work to maintain than, and whether it enables more functionality than, simply adding the “this samtools needs htslib v1.X” information to samtools's README so that distributions can easily update the |
(GMP has not added any functions since the last SONAME changes.)
|
I'm sorry, but why is GSL "doing it wrong too"? Your own output shows it has 0 removed and 0 changed functions, only added ones. Maybe I'm just being thick, but I still don't understand what's actually wrong here. Using your example, say someone built their own tool linked against libgsl.so.23.0.0 (which is almost certainly using a symlink from libgsl.so.23 via the SONAME). We then update the system installed libgsl package and it replaces libgsl.so.23.0.0 with libgsl.so.23.1.0. The symlink moves, but our application still links against libgsl.so.23 and the program continues working because the new library is backwards compatible. There is no need to boost SONAME here (as suggested in your option 1), and doing so would break this application unless we install both side by side (which is possible, but needs support from the packaging system). Also remember: not every application is installed via the package manager. That needs to be a major consideration too. (Plus it's going to be a total nightmare for things like conda.) So option 1 is a complete non-starter. So what about your option 2 of using a linker version script. In this scenario this makes zero difference to the user, as the program works anyway. What about if we downgraded the library? If the user compiled their application using a newly-added function in libgsl.so.23.1.0 and we downgraded to libgsl.so.23.0.0 then yes there would be a runtime error. This is a genuine problem, although (conda bugs excepted) it's very rare for people to downgrade libraries instead of upgrade them. However adding symbols here in the library won't make their application work, it just gives a different error. From your perspective of trying to make a distribution package, I now see via John's reply that it may help with some automated tools such as rpmbuild. You complain that it's taken "15-comment-back-and-forth" but maybe it would have been faster if you had replied to my questions. So far every answer I've had (and Rob too) has come from John... If you want us to consider this, please explain what the exact problem is you are solving and how adding a symbol version file will solve it. Right now I can't see any benefit over just specifying min and max version numbers in package dependencies, which is considerably less work for us. |
Ah, so am I right in thinking that this request stems from the openSUSE? That states:
Unfortunately this is opposed to the Debian policy on shared libraries which says:
Currently we follow the Debian policy, doing both might be tricky. For what it's worth, Fedora seems to be a bit wishy-washy on the subject, both in the packaging guidelines and in the notes for C and C++ packages. The part about explicit requires implies that they don't expect All distributions seem to like symbol versioning. Although if we did that correctly, our |
If I understand this, it solves the future-knowledge problem. We can write a package stating it needs X >= 1.10 and < 1.20 if package X 1.20 has an ABI breaking change and has been released, but before that point we obviously cannot set the upper-band on version numbers. We'd have to go back and update the previous package meta-data (which is doable, and probably should be done on any distro offering LTS to prevent incorrect upgrading). However the alternative here is needs SONAME 3 and X >= 1.10, as installing X 1.20 with SONAME 4 would invalidate it anyway so no upper-limit on version number is even needed. Where symbol versioning has been useful for me in the past is binary compatibility between systems, either to another distro or more likely to another release of the same distro. That's equivalent to the package downgrade case I listed above. In this situation we're not even using a package manager for the application either. We just have a user binary built/linked on system A and being ran on system B, with both systems haveing the same SONAME. In this regard, the symbol versioning provides an immediate check of compatibility rather than a potential crash later on. I think that's the more compelling reason to do this work, but I'm still intrigued to know the actual details of @StefanBruens's problem, which is still guesswork. |
Easy: 1. The current set of delivered symbols (functions, variables) stays unversioned; (or) 2. a previously unversioned symbol can gain a version without backward-ill effect; (or) 3. bump SONAME once more; (or) 4. an exception at the distro level, because things "can't get worse". |
All Provides are independent variables. Using e.g. htslib 1.10, htslib 1.13 and (an imaginary) htslib 1.99:
When you build a package depending on libhts 1.13, you get the dependency on libhts.so.3 automatically. Without any further manual deps, htslib 1.10 would be sufficient. Manually adding For openSUSE, we dislike the manual approach because, well, it is manual and needs extra work in every dependent package. For Fedora, this is not correctly solvable without introducing another manual provides. |
|
Are you saying that Fedora has no way of adding an SONAME dependency? I'm not talking about package names (ie libhts vs libhts3 naming), but the SONAME field in the binary? I was assuming you could say package X depends of libhts >= 1.10 && SONAME == "libhts.so.3".
All the information is already there. Adding more version fields would help certain things such as automatically knowing when to upgrade htslib rather than relying on correctly listing it in dependencies, but if the package manager chooses to not put the soname into the package name and doesn't have an alternative such as the basic task of reading SONAME then frankly "they're doing it wrong". That's not to say symbol versioning wouldn't help, but I'm genuinely baffled as to why things as they currently stand makes it impossible to do reliable package management on Fedora. |
Of course Fedora adds the SONAME (automatically). But, referring to my previous example:
-> Both dependencies are satisfied, but X will crash. |
Thank you. I understand this now. So it's simply an issue that the X package has incomplete dependencies as it hasn't correctly specified the minimum version of htslib it needs. As I said before, we can make this more explicit for samtools to aid packaging, but we obviously can't do anything for external generic package X. Is it the case that if we use symbol versioning, and the dependency in X becomes a "provides symbol-set Y" rather than "release > Z" type of dependency, that this is automated via objdump, nm, etc? I think other distributions automate this already even without explicit symbol versioning by maintaining their own list of symbols per package version, but I assume that machinery is absent in Fedora and/or OpenSuse We can look into adding symbol versioning, but I wouldn't want it unless we can automate it as otherwise we're almost certainly going to have an accident at some stage, and explicit-but-incorrect metadata is going to be worse than no metadata. Given we already have strict symbol visibility macros, I think automation is doable via some sort of nm vs existing map file analysis to see what's exported and identify when the new library has additional symbols vs the old one (without explicitly having a copy of that old library to hand), but I don't know how long it'd take to write and debug such a system. If you know of a good starting point or have ideas then we'd be amenable to PRs. |
Here's a draft of what htslib.map might look like for HTSlib. Patch Makefile as follows (eventually this should be probed for in configure.ac), and be sure to link @@ -343,8 +343,8 @@
# As a byproduct invisible to make, libhts.so.NN is also created, as it is the
# file used at runtime (when $LD_LIBRARY_PATH includes the build directory).
-libhts.so: $(LIBHTS_OBJS:.o=.pico)
- $(CC) -shared -Wl,-soname,libhts.so.$(LIBHTS_SOVERSION) $(LDFLAGS) -o $@ $(LIBHTS_OBJS:.o=.pico) $(LIBS) -lpthread
+libhts.so: $(LIBHTS_OBJS:.o=.pico) htslib.map
+ $(CC) -shared -Wl,-soname,libhts.so.$(LIBHTS_SOVERSION) -Wl,-version-script,$(srcprefix)htslib.map $(LDFLAGS) -o $@ $(LIBHTS_OBJS:.o=.pico) $(LIBS) -lpthread
ln -sf $@ libhts.so.$(LIBHTS_SOVERSION) Happily a large part of trimming this list is already done by the symbol visibility work. It would be good to regenerate this via an independently produced hacky |
Many thanks for this John. It's a great start to backfill the historical data. We'll look at automation to keep it up to date via nm/awk hackery or similar. (And of course with automation to try it out on historical versions.) |
Thanks for all the work! Distributions work would be so much easier if every project took such concerns seriously. |
Currently, the SOVERSION is only bumped when there are ABI changes which are not backwards compatible.
This is typically insufficient for several reasons:
There are two approaches:
For (1.), see e.g. https://fedora.pkgs.org/36/fedora-x86_64/samtools-1.13-2.fc36.x86_64.rpm.html - the package only requires libhts.so.3, but it will likely fail with htslib=1.12.0. Same is true for e.g. (open)SUSE.
The text was updated successfully, but these errors were encountered: