Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a more conservative approach for SOVERSION #1505

Closed
StefanBruens opened this issue Sep 8, 2022 · 27 comments · Fixed by #1560
Closed

Use a more conservative approach for SOVERSION #1505

StefanBruens opened this issue Sep 8, 2022 · 27 comments · Fixed by #1560
Assignees

Comments

@StefanBruens
Copy link

Currently, the SOVERSION is only bumped when there are ABI changes which are not backwards compatible.

This is typically insufficient for several reasons:

  1. Many distributions do not track individual exported symbols, but only the SONAME. Too old versions of htslib wont be detected.
  2. The linux linker has no notion of backwards compatibility, the SONAME is treated as an opaque name. Using a too old libhts with a current (e.g.) samtools will crash on the first use of a new added symbol, i.e. after some arbitrary runtime.

There are two approaches:

  1. Bump SOVERSION on every (also backwards-compatible) ABI change. This has (may have) the drawback to require dependent tools to be rebuild more often. For bcftools and samtools this is less of an issue, as they depend on the latest htslib anyway.
  2. Use a linker version script. This is an approach used by e.g. (GNU) libc and Qt. The SONAME stays the same on backwards compatible changes, but new symbols are annotated with the library version which added the symbol. This is the cleaner approach, but requires significantly more work.

For (1.), see e.g. https://fedora.pkgs.org/36/fedora-x86_64/samtools-1.13-2.fc36.x86_64.rpm.html - the package only requires libhts.so.3, but it will likely fail with htslib=1.12.0. Same is true for e.g. (open)SUSE.

@jmarshall
Copy link
Member

jmarshall commented Sep 9, 2022

Providing any shared library compatibility at all means that when a user does a non-ABI-breaking update to their htslib installation, any previously installed (older) samtools executables continue to work — and possibly function better than before, as they may benefit from bug fixes in the library. Your approach (1) clearly destroys that.

On ELF platforms (at least), the solution to the problem you describe is indeed related to symbol versioning. (Or in a distro context, the other solution is package dependencies in addition to shared library file dependencies. See below.) In the past, HTSlib has declined to take on this maintenance burden. Debian do some symbol version tracking in their packaging (see debian/libhts3.symbols) but I don't believe this results in any actual symbol versioning being applied in their libhts.so.* as packaged. Oddly enough, they have never bothered contributing any of this upstream.

For (1.), see e.g. https://fedora.pkgs.org/36/fedora-x86_64/samtools-1.13-2.fc36.x86_64.rpm.html - the package only requires libhts.so.3, but it will likely fail with htslib=1.12.0.

That can be considered be a bug in Fedora's packaging. You should file a bug with them suggesting that they add an appropriate Requires: htslib >= 1.x to their samtools.spec, so the resulting RPM would have that dependency in addition to the automatically inferred libhts.so.3()(64bit) dependency.

@jkbonfield
Copy link
Contributor

Agreed with John above.

It's probably fair to say however that the samtools and bcftools configure scripts ought to check specific versions better, and we should document the minimum htslib version needed in the INSTALL file too which lists other dependencies. This would make it easier for distributions to specify the dependencies correctly.

@StefanBruens
Copy link
Author

Providing any shared library compatibility at all means that when a user does a non-ABI-breaking update to their htslib installation, any previously installed (older) samtools executables continue to work — and possibly function better than before, as they may benefit from bug fixes in the library. Your approach (1) clearly destroys that.

Users will even get a better experience when they also update samtools at the same time (which implies a rebuild). I can't see a scenario where updating htslib is possible, but rebuilding samtools and bcftools puts an unreasonable burden.

Also, shared libraries with different soversions are co-installable. An older samtools could still use libhts.so.3, while a hypothetical libhts.so.4 is available at the same time.

On ELF platforms (at least), the solution to the problem you describe is indeed related to symbol versioning. (Or in a distro context, the other solution is package dependencies in addition to shared library file dependencies. See below.) In the past, HTSlib has declined to take on this maintenance burden. Debian do some symbol version tracking in their packaging (see debian/libhts3.symbols) but I don't believe this results in any actual symbol versioning being applied in their libhts.so.* as packaged. Oddly enough, they have never bothered contributing any of this upstream.

Debians approach is totally distro specific. Although it bares some similarities with proper symbol versioning, it is different. It also only happens on the package manager level, the linker is totally unaware of it.

For (1.), see e.g. https://fedora.pkgs.org/36/fedora-x86_64/samtools-1.13-2.fc36.x86_64.rpm.html - the package only requires libhts.so.3, but it will likely fail with htslib=1.12.0.

That can be considered be a bug in Fedora's packaging. You should file a bug with them suggesting that they add an appropriate Requires: htslib >= 1.x to their samtools.spec.

This would be a very cumbersome, and very fragile approach.

  1. The required version is unknown for the packager
  2. Even with a documented minor version, this is still a manual approach, the packager might miss any version bump
  3. It would not even work reliably in the way you have written it:
    Shared library packages in general don't provide something like foolib=<version>, but a symbol derived from the soname plus some tags like (64bit). We can manually provide something like htslib = <version> on top, but that is fragile - think of the hypothetical package set of libhts3=1.16, libhts3=1.20, libhts4=1.99, the first and 3rd (coinstalled) would provide both libhts.so.3()(64bit) and htslib >= 1.20. So the correct depency would be Requires: libhts3 >= 1.20.

As you can hopefully see, the current scheme has several drawbacks, and is quite fragile. Every step to make it work better is currently distro specific and prone to fail due to human error.

The backwards compatibility benefit IMHO is mostly a theoretical one. Encouraging people to also update bcftools and samtools has its own merits. Third party tools can either be rebuild, or keep using a coinstalled older htslib version.

@jmarshall
Copy link
Member

You seem to largely know what you're talking about. Rest assured that other people do too.

To be sure, “appropriate” was doing a lot of heavy lifting in my sentence. As James noted, upstream samtools & bcftools could do more to make their particular HTSlib requirements more clear, and thus take up the maintenance burden of making that information available to packagers who may wish to know it.

Your point (3)'s use of libhts3/libhts4 package naming betrays a Debian (or similar) mindset. If you're going to use a Fedora example, you should probably expect people to reply to it using Fedora package naming conventions.

@jmarshall
Copy link
Member

You should file a bug with [Fedora] suggesting that they add an appropriate Requires: htslib >= 1.x to their samtools.spec.

This would be a very cumbersome, and very fragile approach.

Incidentally this is what Debian, Bioconda, and NetBSD do.

@daviesrob
Copy link
Member

What situation have you come across where this is causing a problem? Normally you would either build samtools and htslib together, or install them together, so the versions should match.

Note that having too many versions of the library around can cause other problems. It's not unknown for third-party packages to link HTSlib more than once via multiple dependencies. If these end up linking different copies of the library then the results are likely to be bad. Also, some package managers (notably Conda) have trouble dealing with multiple library versions. Having packages depend on various different HTSlibs would make it very difficult to install some of them together where that is the case.

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 9, 2022

I'm not disagreeing that full symbol versioning wouldn't be a nice thing, but it's alsot quite a bit of work to maintain and there are risks there with accidents in mislabelling still causing problems, so it needs considerable infrastructure and testing/validation frameworks too. I'm not inclined to do this unless there is a specific demonstration of it being necessary, rather than just desireable.

Shared library packages in general don't provide something like foolib=<version>, but a symbol derived from the soname plus some tags like (64bit). We can manually provide something like htslib = <version> on top, but that is fragile - think of the hypothetical package set of libhts3=1.16, libhts3=1.20, libhts4=1.99, the first and 3rd (coinstalled) would provide both libhts.so.3()(64bit) and htslib >= 1.20. So the correct depency would be Requires: libhts3 >= 1.20.

I don't understand this point. It's perfectly valid for libhts3 to be provided by both package 1.16 and 1.20, and infact is generally what happens. If a tool has a dependency on package version 1.16 and something else has a dependency on 1.20 causing it to be updated, then the tool with a dependency on 1.16 will carry on working just fine, without needing to install both at the same time. That's the definition of backwards compatible. I think the confusion here is how dependencies are written. My understanding is they list package versions rather than library versions, in which case it should all work fine (provided the dependency information is correctly written down, which Conda has shown is most definitely not a given).

@jmarshall
Copy link
Member

The confusion here is due to various comments mixing up the package naming conventions and parallel installation conventions of different distributions.

Debian names its shared library runtime packages like libhtsSOVER, and enables both libhts.so.3 and an eventual libhts.so.4 to be installed by having the user install both libhts3 and libhts4 (which are packages with different names).

Fedora names its shared library runtime packages like htslib, and enables both libhts.so.3 and libhts.so.4 to coexist by having the user install multiple versions of the htslib package simultaneously (which is allowed and not enormously uncommon in RPM-based distributions).

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 12, 2022

Thanks for the clarification, but I still don't really understand how it changes anything.

The package names in Debian may be libhts3 and libhts4, but the packages still have version numbers. Eg I currently have libcurl4 (version 7.58.0-2ubuntu3.20) and libcurl3-nss (7.58.0-2ubuntu3.20) both installed at the same time. Packages depend on the library so version for their symbols, but also often specify the minimum version. Eg samtools here depends on libhts3 version >= 1.10.

I can see absolutely no reason why Fedora cannot do the same, nor any other system. I don't see this as fragile either. The definition of backwards compatibility is that if package A depends on libhts3 >= 1.10 and package B depends on libhts3 >= 1.16 then 1.16 will be installed and we have knowledge it'll still work for package A.

Now if we lose the "3" and use a flat namespace for all libraries and we don't know that, say, libhts 1.20 provides a libhts.so.4 library as it had an ABI change, then we have to start being more precise, eg libhts >= 1.16 and < 1.20. It still works fine, although it's a bit more labour intensive (but that's the price you pay for a single flat naming scheme and not separating by ABI). Further more, it's obviously less problematic than having pretty much every release having an so version bump, as that would just lead to a myriad of package versions and also make it far more likely that someone ends up using a buggy library for particular tools as people would just hard-code the version that worked when they released their tool. (If not, you're back to the same deal of tracking >= vers < vers, so it bought us nothing anyway).

The right solution is indeed symbol versioning meta-data, but it's quite onerous to keep up to date and from what I can see it only tends to be large packages with huge teams supporting them that do this. I also don't see how it helps if your packaging naming isn't distinguishing between major SO version numbers anyway, as package maintainers would still have to manually track a list of which htslib release provides which SO version. It fixes run time complaints, but doesn't do anything to solve package dependencies I think.

@jmarshall
Copy link
Member

package maintainers would still have to manually track a list of which htslib release provides which SO version

If the libraries being built against contain versioned symbols, rpmbuild and the deb equivalent can automatically record them as inferred dependencies. They already do this to automatically add a shared library dependency on libhts.so.3 (thus locking in the soversion requirement; this is what conda is missing). The multiple libm.so.6(GLIBC_2.x) requirements listed at the previously linked samtools RPM page were automatically added by rpmbuild.

But in general, I agree with everything you say — I haven't seen many libraries other than glibc itself actually doing this.

@jengelh
Copy link

jengelh commented Sep 15, 2022

I haven't seen many libraries other than glibc itself actually doing this.

Maybe you did not look close enough? Querying a sizable subset of the distro, there surely must be a name you recognize in https://paste.opensuse.org/16294206 .

Debian do some symbol version tracking in their packaging (see debian/libhts3.symbols) but I don't believe this results in any actual symbol versioning being applied

Right, that's just for verification. In a way, that file is the left-hand argument for a tool like abidiff (even though abidiff probably is not the one implementation that is getting used in Debian building).

@jmarshall
Copy link
Member

Maybe I haven't looked recently. I looked at a few of those you listed that appear to be efforts by small development teams, similar to HTSlib's position.

Libjansson just uses --default-symver to solve a different problem (name clashes with other JSON-related libraries), which does nothing for the problem at hand here. JSON-C introduced symbol versioning for the same reason, but does maintain json-c.sym upstream itself.

Readline appears on your list, but does not have versioned symbols on Alma Linux and does not appear to maintain a map file upstream. So I assume this is added in your distribution's packaging.

I see that libacl, ncurses, xz, and zlib do in fact maintain their own exports, ncurses.map, liblzma*.map, and zlib.map files upstream. I am somewhat surprised, as I expected that in more cases this would be something added by distributions. On Alma Linux at least, some of these version scripts are not used: e.g. ncurses's symbols are not versioned on this distribution.

I note that GSL, GMP, MPFR, MPC, and OpenBLAS are fairly major numerics libraries that do not appear on your list and do not appear to version their symbols (at least on Alma Linux).

Perhaps those advocating for HTSlib symbol versioning would like to propose a draft PR showing what would be required. Or perhaps they could provide some answers to help overcome the maintainers' reluctance to take on this maintenance burden. I'm no longer one of the core maintainers, but for me the initial questions would be:

  • What are the compatibility implications of turning on symbol versioning? Does it need to be / benefit from being done in conjunction with a soversion bump?
  • Is it important to backfill historically accurate version numbers for existing symbols when initially adding versioning?
  • What are the compatibility implications of mistakes made while maintaining this file? e.g. Failing to list a symbol? Inadvertently listing a symbol that should not have been listed? Listing a symbol with an incorrect version?

(Rest assured that the maintainers will be familiar with Drepper's DSO Howto.)

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 15, 2022

If we or someone else is to do the leg work for this, we need to know what problem we're actually trying to solve. Specifically:

Currently, the SOVERSION is only bumped when there are ABI changes which are not backwards compatible.

This is typically insufficient for several reasons:

1. Many distributions do not track individual exported symbols, but only the SONAME. Too old versions of htslib wont be detected.

If distributions do not track individual symbols and only have a package named after SONAME (Debian) or worse after the overall package name (RedHat?), then how does recording which symbols appeared in which release solve things for the distribution makers? I can't see that it changes anything.

2. The linux linker has no notion of backwards compatibility, the SONAME is treated as an opaque name. Using a too old libhts with a current (e.g.) samtools will crash on the first use of a new added symbol, i.e. after some arbitrary runtime.

My understanding of this is that adding symbol versioning will "solve" this by causing the application to complain about an incompatible library at the run-time link phase; so when we launch the application rather than when the function is first used. Is that correct? If so we haven't helped the user in any meaningful way. A failure is still a failure.

Note I'm not saying that we shouldn't improve things. I think we've been a bit slack in reporting which htslib versions are compatible with which samtools and bcftools packages. Stating this very clearly (and optionally also enforcing it during builds via configure checks) would greatly simplify the job of producing correct packages for the distribution vendors.

For now we've sort of ran an implicit assumption of samtools 1.X needs to compile against htslib >= 1.X, and as a run-time binary it needs the correct SONAME htslib with version >= 1.X. Now that's not entirely true as sometimes we can get away with e.g. samtools 1.9 using a copy of htslib 1.8 (I haven't checked - this is purely an example) as it's not using any newer functions, but for safety and simplicity it's easier to assume equal or newer version is OK.

@jmarshall
Copy link
Member

jmarshall commented Sep 15, 2022

I think the answer to both those questions — from the distribution point of view — is that it enables the complaint and failure to occur at package upgrade time. Which is well before the program/application is run/launched.

As described in #1505 (comment), if libhts.so.3 is built with symbol versioning, rpmbuild and other package building tools can extract that information and record it in the resulting package file. (Not on a per-symbol-used basis, but on a per-distinct-version-string-used basis.)

Then when a user has htslib-1.12 and samtools-1.12 installed, and types rpm -Uvh samtools-1.16.….rpm, the samtools update can fail with “your installed htslib package does not provide libhts.so.3(HTSLIB_1.14)” or so — even if no-one has bothered adding a dependency like Requires: htslib >= 1.16 formally recording the implicit assumptions you described in your last paragraph.

(In practice of course, the user is probably using dnf or another repo frontend, and will be offered both htslib-1.16 and samtools-1.16 for updating. But this will prevent them from accepting only the samtools update.)

Whether this is less or more work to maintain than, and whether it enables more functionality than, simply adding the “this samtools needs htslib v1.X” information to samtools's README so that distributions can easily update the Requires: htslib >= 1.X dependency when they update their spec files is also a relevant question…

@jengelh
Copy link

jengelh commented Sep 15, 2022

I note that GSL, GMP, MPFR, MPC, and OpenBLAS are fairly major numerics libraries that do not appear on your list.

(GMP has not added any functions since the last SONAME changes.)
GSL, MPFR, well they're doing it wrong too, but reporters don't have energy to do a 15-comment-back-and-forth ticket such as the current one on every wrongdoer. If there is no quick convincing, then a distro may force-add symvers (possibly with an overarching glob), which happened with readline/ncurses/libcfile, which trades one or two problems for one/two different ones.

$ abidiff .libs/libgsl.so.23.0.0 /tmp/libgsl.so.23.1.0  # this is gsl-2.4 vs gsl-2.5
Functions changes summary: 0 Removed, 0 Changed (35 filtered out), 157 Added functions
Variables changes summary: 0 Removed, 0 Changed (1 filtered out), 13 Added variables
abidiff ./src/.libs/libmpfr.so.6.0.0 /tmp/libmpfr.so.6.1.0 # mpfr-4.0.0 vs master
Functions changes summary: 0 Removed, 1 Changed (25 filtered out), 96 Added functions
Variables changes summary: 0 Removed, 0 Changed, 71 Added variables

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 15, 2022

I'm sorry, but why is GSL "doing it wrong too"? Your own output shows it has 0 removed and 0 changed functions, only added ones. Maybe I'm just being thick, but I still don't understand what's actually wrong here.

Using your example, say someone built their own tool linked against libgsl.so.23.0.0 (which is almost certainly using a symlink from libgsl.so.23 via the SONAME). We then update the system installed libgsl package and it replaces libgsl.so.23.0.0 with libgsl.so.23.1.0. The symlink moves, but our application still links against libgsl.so.23 and the program continues working because the new library is backwards compatible.

There is no need to boost SONAME here (as suggested in your option 1), and doing so would break this application unless we install both side by side (which is possible, but needs support from the packaging system). Also remember: not every application is installed via the package manager. That needs to be a major consideration too. (Plus it's going to be a total nightmare for things like conda.) So option 1 is a complete non-starter.

So what about your option 2 of using a linker version script. In this scenario this makes zero difference to the user, as the program works anyway. What about if we downgraded the library? If the user compiled their application using a newly-added function in libgsl.so.23.1.0 and we downgraded to libgsl.so.23.0.0 then yes there would be a runtime error. This is a genuine problem, although (conda bugs excepted) it's very rare for people to downgrade libraries instead of upgrade them. However adding symbols here in the library won't make their application work, it just gives a different error.

From your perspective of trying to make a distribution package, I now see via John's reply that it may help with some automated tools such as rpmbuild. You complain that it's taken "15-comment-back-and-forth" but maybe it would have been faster if you had replied to my questions. So far every answer I've had (and Rob too) has come from John...

If you want us to consider this, please explain what the exact problem is you are solving and how adding a symbol version file will solve it. Right now I can't see any benefit over just specifying min and max version numbers in package dependencies, which is considerably less work for us.

@daviesrob
Copy link
Member

Ah, so am I right in thinking that this request stems from the openSUSE? That states:

The principle rule of versioning is that removing or changing the ABI in an incompatible way (forwards as well as backwards) requires a new, different SONAME. (With a technique called symbol versioning, the SONAME may be kept on forward-compatible changes, but that is for some other documentation to address.)

Unfortunately this is opposed to the Debian policy on shared libraries which says:

The SONAME and binary package name need not, and indeed normally should not, change if new interfaces are added but none are removed or changed, since this will not break binaries linked against the old shared library.

Currently we follow the Debian policy, doing both might be tricky.

For what it's worth, Fedora seems to be a bit wishy-washy on the subject, both in the packaging guidelines and in the notes for C and C++ packages. The part about explicit requires implies that they don't expect SONAME to change for backwards-compatible changes, but doesn't offer a good solution to the forwards-compatibility problem.

All distributions seem to like symbol versioning. Although if we did that correctly, our SONAME would never change again. I don't see how that would work with the openSUSE policy, unless a different method is used to check compatibility when versioned symbols are present?

@jkbonfield
Copy link
Contributor

Then when a user has htslib-1.12 and samtools-1.12 installed, and types rpm -Uvh samtools-1.16.….rpm, the samtools update can fail with “your installed htslib package does not provide libhts.so.3(HTSLIB_1.14)” or so — even if no-one has bothered adding a dependency like Requires: htslib >= 1.16 formally recording the implicit assumptions you described in your last paragraph.

If I understand this, it solves the future-knowledge problem. We can write a package stating it needs X >= 1.10 and < 1.20 if package X 1.20 has an ABI breaking change and has been released, but before that point we obviously cannot set the upper-band on version numbers. We'd have to go back and update the previous package meta-data (which is doable, and probably should be done on any distro offering LTS to prevent incorrect upgrading). However the alternative here is needs SONAME 3 and X >= 1.10, as installing X 1.20 with SONAME 4 would invalidate it anyway so no upper-limit on version number is even needed.

Where symbol versioning has been useful for me in the past is binary compatibility between systems, either to another distro or more likely to another release of the same distro. That's equivalent to the package downgrade case I listed above.

In this situation we're not even using a package manager for the application either. We just have a user binary built/linked on system A and being ran on system B, with both systems haveing the same SONAME. In this regard, the symbol versioning provides an immediate check of compatibility rather than a potential crash later on. I think that's the more compelling reason to do this work, but I'm still intrigued to know the actual details of @StefanBruens's problem, which is still guesswork.

@jengelh
Copy link

jengelh commented Sep 15, 2022

I don't see how that would work with the openSUSE policy

Easy: 1. The current set of delivered symbols (functions, variables) stays unversioned; (or) 2. a previously unversioned symbol can gain a version without backward-ill effect; (or) 3. bump SONAME once more; (or) 4. an exception at the distro level, because things "can't get worse".

@StefanBruens
Copy link
Author

All Provides are independent variables.

Using e.g. htslib 1.10, htslib 1.13 and (an imaginary) htslib 1.99:

  • 1.10 provides libhts3=1.10 (or htslib=1.10 on Fedora) and libhts.so.3
  • 1.13 provides libhts3=1.13 (or htslib=1.10 on Fedora) and libhts.so.3
  • 1.99 provides libhts4=1.99 (htslib=1.99) and libhts.so.4

When you build a package depending on libhts 1.13, you get the dependency on libhts.so.3 automatically. Without any further manual deps, htslib 1.10 would be sufficient.

Manually adding Requires: libhts3 >= 1.13 solves this problem on e.g. openSUSE, because only libhts3==1.13 satisfies both the manual and automatic dependencies (libhts3==1.16 also does, libhts4=* does not).
Doing likewise (Requires: htslib >= 1.13) on Fedora would not solve the problem. You could install htslib==1.10 and htslib==1.99 at the same time. The first one satisfies the automatic dependency (libhts.so.3), the second one the manual (htslib>=1.13).

For openSUSE, we dislike the manual approach because, well, it is manual and needs extra work in every dependent package.

For Fedora, this is not correctly solvable without introducing another manual provides.

@StefanBruens
Copy link
Author

Unfortunately this is opposed to the Debian policy on shared libraries which says:

The SONAME and binary package name need not, and indeed normally should not, change if new interfaces are added but none are removed or changed, since this will not break binaries linked against the old shared library.

need not & should not is different to must not. Not changing is encouraged, but not required. The are dozens of projects which bump SOVERSION on each release, although the ABI has not changed.

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 16, 2022

Manually adding Requires: libhts3 >= 1.13 solves this problem on e.g. openSUSE, because only libhts3==1.13 satisfies both the manual and automatic dependencies (libhts3==1.16 also does, libhts4=* does not). Doing likewise (Requires: htslib >= 1.13) on Fedora would not solve the problem. You could install htslib==1.10 and htslib==1.99 at the same time. The first one satisfies the automatic dependency (libhts.so.3), the second one the manual (htslib>=1.13).

For openSUSE, we dislike the manual approach because, well, it is manual and needs extra work in every dependent package.

For Fedora, this is not correctly solvable without introducing another manual provides.

Are you saying that Fedora has no way of adding an SONAME dependency? I'm not talking about package names (ie libhts vs libhts3 naming), but the SONAME field in the binary? I was assuming you could say package X depends of libhts >= 1.10 && SONAME == "libhts.so.3".

$  objdump -x libhts.so.3|grep SONAME
  SONAME               libhts.so.3

All the information is already there. Adding more version fields would help certain things such as automatically knowing when to upgrade htslib rather than relying on correctly listing it in dependencies, but if the package manager chooses to not put the soname into the package name and doesn't have an alternative such as the basic task of reading SONAME then frankly "they're doing it wrong".

That's not to say symbol versioning wouldn't help, but I'm genuinely baffled as to why things as they currently stand makes it impossible to do reliable package management on Fedora.

@StefanBruens
Copy link
Author

Of course Fedora adds the SONAME (automatically). But, referring to my previous example:

  • Package X requires htslib >= 1.13 (manually specified) and libhts.so.3 (autogenerated).
  • System has htslib == 1.10 and htslib == 1.99 installed

-> Both dependencies are satisfied, but X will crash.

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 16, 2022

Thank you. I understand this now.

So it's simply an issue that the X package has incomplete dependencies as it hasn't correctly specified the minimum version of htslib it needs. As I said before, we can make this more explicit for samtools to aid packaging, but we obviously can't do anything for external generic package X.

Is it the case that if we use symbol versioning, and the dependency in X becomes a "provides symbol-set Y" rather than "release > Z" type of dependency, that this is automated via objdump, nm, etc? I think other distributions automate this already even without explicit symbol versioning by maintaining their own list of symbols per package version, but I assume that machinery is absent in Fedora and/or OpenSuse

We can look into adding symbol versioning, but I wouldn't want it unless we can automate it as otherwise we're almost certainly going to have an accident at some stage, and explicit-but-incorrect metadata is going to be worse than no metadata. Given we already have strict symbol visibility macros, I think automation is doable via some sort of nm vs existing map file analysis to see what's exported and identify when the new library has additional symbols vs the old one (without explicitly having a copy of that old library to hand), but I don't know how long it'd take to write and debug such a system. If you know of a good starting point or have ideas then we'd be amenable to PRs.

@jmarshall
Copy link
Member

Here's a draft of what htslib.map might look like for HTSlib. Patch Makefile as follows (eventually this should be probed for in configure.ac), and be sure to link htsfile/samtools/etc against the shared libhts.so to try it out:

@@ -343,8 +343,8 @@
 # As a byproduct invisible to make, libhts.so.NN is also created, as it is the
 # file used at runtime (when $LD_LIBRARY_PATH includes the build directory).
 
-libhts.so: $(LIBHTS_OBJS:.o=.pico)
-	$(CC) -shared -Wl,-soname,libhts.so.$(LIBHTS_SOVERSION) $(LDFLAGS) -o $@ $(LIBHTS_OBJS:.o=.pico) $(LIBS) -lpthread
+libhts.so: $(LIBHTS_OBJS:.o=.pico) htslib.map
+	$(CC) -shared -Wl,-soname,libhts.so.$(LIBHTS_SOVERSION) -Wl,-version-script,$(srcprefix)htslib.map $(LDFLAGS) -o $@ $(LIBHTS_OBJS:.o=.pico) $(LIBS) -lpthread
 	ln -sf $@ libhts.so.$(LIBHTS_SOVERSION)

Happily a large part of trimming this list is already done by the symbol visibility work. It would be good to regenerate this via an independently produced hacky nm-output-based script and compare the drafts. And to vet this list carefully.

@jkbonfield
Copy link
Contributor

jkbonfield commented Sep 20, 2022

Many thanks for this John. It's a great start to backfill the historical data. We'll look at automation to keep it up to date via nm/awk hackery or similar. (And of course with automation to try it out on historical versions.)

@StefanBruens
Copy link
Author

Thanks for all the work! Distributions work would be so much easier if every project took such concerns seriously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants