specify RPATH when linking to libpq #225

davepacheco · 2021-09-08T23:41:18Z

See #213 for context.

I don't think we should land this PR, but I wanted to keep it here for future reference. This was my best attempt at a generalizable pattern for linking to a native library that's not on the runtime linker's default search path. The basic idea is to have the *-sys crate that wraps the native library expose metadata that indicates where it found the native library. Then we use this at the "top level" (here in omicron) to pass an appropriate -R flag to the linker (wrapped in a -W flag because we're really passing it to gcc). This change depends on the "oxide" branch of https://github.com/oxidecomputer/pq-sys, which adds that metadata. (This dependency shows up as a "patch" section in the top level Cargo.toml.)

The good things about this approach:

We're still able to leverage the pq-sys crate's logic for locating the library and the resulting RPATH entry is necessarily the same.
The logic that adds the RPATH is at the top level, not in pq-sys. That's where it belongs -- see this comment.
This approach gives us the flexibility to decide that official builds will put the library at some known place on the deployed system, even if that doesn't match what where it is on the build machine. (We'd need to update the top-level build script logic to do this, and then pass the env var to pq-sys that causes it to find the library we want, but this seems straightforward.)

There's a bunch of ugly here:

For every omicron crate that this applies to, I had to add a build.rs script that checks for the pq-sys metadata and uses the cargo:rustc-link-arg option to emit the -R flag that we want.
The cargo:rustc-link-arg option is currently unstable, so we need to be on "nightly". It's stabilized in Rust 1.56, which is a couple of months away.
Metadata emitted by Cargo build scripts is only available to immediate dependents. We don't depend on pq-sys -- we depend on Diesel. To work around this, I added dependencies on pq-sys wherever we're currently depending on diesel. This relies on our dependency matching Diesel's, too -- if Diesel updated to one that was incompatible with our semver, I'm not sure what would happen.
All of this is copy/pasted across most of the crates in this repo. That's because omicron-common depends on diesel (see Use fewer features for the diesel dependency in omicron-common #221?), and most other things depend on omicron-common. We're deliberately trying to set RPATH at the top level, so naturally all of the affected top-level crates need this.

If we were to go with this approach, we could:

cleanup the duplicated logic in the build.rs files by putting it into a common crate and using that as a build-dependency
fix the XXX comments to explain the situation.
double-check the "nightly" toolchain we're using. As I write this up, I realize that I removed the notes about the unstable "asm" feature because my build worked without them, but that may be because I wasn't actually using propolis.

The real killer, though, is that "cargo:rustc-link-arg" is not used when building rustdoc tests. The net result is that builders of omicron still need to set RUSTDOCFLAGS if they want to run the test suite. All this ugliness just prevents them from having to set RUSTFLAGS.

…ngs with that branch

davepacheco · 2021-09-14T23:58:31Z

oximeter/oximeter/src/oximeter_server.rs

@@ -145,8 +145,10 @@ async fn collection_task(
 struct CollectionTask {
    // Channel used to send messages from the agent to the actual task. The task owns the other
    // side.
+    #[allow(dead_code)]


@bnaecker I added this here because these unused fields were generating clippy warnings, which were treated as errors by the clippy-lint target. It looks like these are unused but I wasn't sure what the right answer was here. It'd be good to remove these allow directives, one way or another.

I'm not sure why these only generated warnings now. I did update to a newer nightly, and maybe that one has different behavior here.

No problem, thanks for cc-ing me. I'm not sure why they never triggered errors before, either, but the updated toolchain seems likely.

These fields are there so that we can communicate with the collection task. For example, updating the interval on which metrics are collected from a producer is accomplished by sending something on the inbox channel. The join handle might be used if we want to stop collecting metrics from a producer.

It doesn't seem like either one is actually used in the code today, though. At least, I think that's what clippy is saying and I couldn't find a case that contradicted it.

davepacheco · 2021-09-15T00:04:33Z

After exploring the static linking approach under #234, I've come back around and think this is the most promising direction.

The problem I mentioned earlier regarding rustdoc tests is rust-lang/cargo#9895. It appears confirmed as a bug and I believe someone is working on fixing it. With that fixed, this will be a complete solution, even if it's ugly. In the meantime, we can just tell people to set RUSTDOCFLAGS. This is annoying, but at least the failure mode is only that test binaries are broken.

In terms of the ugliness: on @cbiffle's suggestion I refactored the build script logic into a new omicron-rpaths package in the same workspace. The remaining ugliness is (1) that nearly every crate in the workspace grows a dummy dependency on pq-sys in order to get its build script metadata, and (2) that same list of crates also grows an identical, tiny build.rs that just calls out to omicron-rpaths, and (3) we're on nightly until 1.56, but we were anyway for "asm".

The only thing here that I know needs to be updated before landing this change is the patch in the top level Cargo.toml. We should probably update the "oxide" branch in pq-sys and use that instead.

davepacheco · 2021-09-15T00:07:54Z

Pending the CI checks and the change I mentioned in my last comment, I think this is ready for review.

Reviewers: I would suggest:

start with the block comment in omicron_rpaths/src/lib.rs
then check everything else in this PR (which is comparatively tiny)
then take a look at the pq-sys change: oxidecomputer/pq-sys@oxide...oxidecomputer:omicron

bnaecker · 2021-09-15T03:26:44Z

Yes, that’s right. Apologies if I wasn’t clear, but I was trying to say that these are here to support future functionality, so the allow attribute to appease clippy is fine IMO.

…

On 14 Sep 2021, at 19:39, David Pacheco ***@***.***> wrote: @davepacheco commented on this pull request. In oximeter/oximeter/src/oximeter_server.rs: > @@ -145,8 +145,10 @@ async fn collection_task( struct CollectionTask { // Channel used to send messages from the agent to the actual task. The task owns the other // side. + #[allow(dead_code)] It doesn't seem like either one is actually used in the code today, though. At least, I think that's what clippy is saying and I couldn't find a case that contradicted it. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

smklein

Thank you for working in such depth to fix this. Frankly I had no idea this linkage would be so hairy prior to reading your work in #213 , but hopefully this makes life easier when we inevitably need to link against another native library in the future.

@luqmana I'm super interested in your thoughts here - from my point of view, this whole case is a good reason why rpaths should become a first-class citizen in cargo. Do you think there's anything we could/should do to signal boost this to the cargo team?

I still think an "ideal solution" - which would likely require language/cargo changes to be supported - would be one where a "-sys" crate can define explicit rpaths that are transitively picked up by anyone who depends, directly or indirectly, on that "-sys" crate. That said, I understand the concern about "implicit changes to linker-flags", but having a section like:

[build]
transitive_rpath = true

In a workspace seems much more natural than altering the dependency hierarchy to acquire / act on "DEP_" variables.

Regardless, thank you @dap - this solution looks like it'll cover our native linking use cases while protecting us from unexpected missing rpaths for the foreseeable future.

smklein · 2021-09-15T14:56:33Z

omicron-rpaths/src/lib.rs

+ * So we need to emit the linker argument here.  How do we know what value to
+ * use?  We take the approach used by most build systems: we use the path where
+ * the library was found on the build machine.  But how do we know where that
+ * was?  *-sys packages have non-trivial mechanisms for locating the desired
+ * library.  We don't want to duplicate those here.  Instead, we make use of
+ * metadata emitted by those build scripts, which shows up as "DEP_*"
+ * environment variables for us.
+ *
+ * **Important note:** In order for us to have metadata for these dependencies,
+ * we must *directly* depend on them.  This may mean adding an otherwise unused
+ * dependency from the top-level package to the *-sys package.


Interesting hack - I recall reading about the "DEP_" variables and playing around with them, but thought this was a non-starter because of the lack of transitiveness. Good idea just adding the direct dependency!

smklein · 2021-09-15T15:02:15Z

omicron-rpaths/src/lib.rs

+ * expect these to always be set, and if they're not, it's most likely that
+ * somebody has forgotten to include a required dependency.  We want to tell
+ * them that rather than silently produce unrunnable binaries.


At build time especially, this seems like a reasonable inclination.

luqmana

Thank you @davepacheco for the incredible depth in researching all the possible solutions! I think this is a reasonable approach right now barring any specific support from cargo/rust. Mirroring @smklein sentiments, this got incredibly hairy fast.

I think it'll probably be some time before we can realistically expect a first-class cargo solution there given the limited bandwidth of the cargo team (correct me if I'm wrong @steveklabnik). To really get it accelerated I'd think maybe writing up a pre-RFC and posting to rust-internals would be good. Our own issues and the ones linked above definitely indicate this isn't just a pain point for us. If we can come up with a reasonably scoped surface I'd be happy to help implement/drive it forward.

davepacheco · 2021-09-15T22:40:25Z

Thanks @luqmana and @smklein for taking a look and for all the help along the way.

I'd love to move the Cargo solution forward, too. There's a lot of good context in the Cargo issue. I'm not sure we have consensus internally on the best approach. It might be helpful to write down the goals and see if we can't find a common solution. I think @smklein wants a package to be able to encapsulate its RPATH requirements (correct me if that's wrong), and other folks (both on the Cargo issue and internally) want to specify this from the top level build. It'd help to write down the use cases where each is desirable. I expect we can come up with a solution where, for folks using pkg-config and its ilk that assume the build machine mirrors the production machine, this all "just works" (i.e., top-level builders don't need to do anything), while still allowing the folks who want to control this from the top level to do so.

smklein · 2021-09-15T23:15:05Z

I'd love to move the Cargo solution forward, too. There's a lot of good context in the Cargo issue. I'm not sure we have consensus internally on the best approach. It might be helpful to write down the goals and see if we can't find a common solution. I think @smklein wants a package to be able to encapsulate its RPATH requirements (correct me if that's wrong), and other folks (both on the Cargo issue and internally) want to specify this from the top level build. It'd help to write down the use cases where each is desirable. I expect we can come up with a solution where, for folks using pkg-config and its ilk that assume the build machine mirrors the production machine, this all "just works" (i.e., top-level builders don't need to do anything), while still allowing the folks who want to control this from the top level to do so.

I spent a while chatting with @rmustacc on this topic today, as I understand it:

The advantage of specifying the rpath at a binary / top-level point-of-view is that, at the point a binary is being produced, the target machine should be "known" - as should the target rpath. After all, the .dynamic section of the ELF file needs to be filled with something at this point. I'm definitely convinced that a binary which depends on pq-sys should be able to influence the rpath in a way that, in isolation, pq-sys may not be able to do on its own. One solution is very similar to the result of this PR: all top-level targets explicitly specify the native libraries on which they depend, even indirectly.

I 100% understand the need for top-level targets to be able to determine where rpaths should be uncovered - but it still seems like a bummer that these targets need to basically have hard-coded knowledge about all used native libraries, even if they're the result of a transitive crate dependency. That's the key part I'm wondering about - if the top-level build was responsible for specifying the rpath, but internal "*-sys" crates could emit the required rpaths in some well-known format that could be aggregated somewhere, I think this could accomplish the same goal.

Like, right now, the DEP_PQ_LIBDIRS environment variable gets passed from the pq-sys crate up to the top-level crates, but only because:

The top-level crates directly depend on pq-sys (artificially! just to see the environment variable)
omicron-rpaths has hardcoded this environment variable (within RPATH_ENV_VARS).

I could imagine a feature where - similar to how how the DEP_* variables are propagated upward from dependencies - a set of rpaths are bubbled up transitively from dependencies as an environment variable.

The top-level build could opt-in to using this environment variable (... or whatever mechanism cargo prefers), passing it to the linker, but IMO this would have some advantages over what we're being forced to do now:

No more direct dependencies on the "*-sys" crates! "omicron-nexus" wouldn't need to enumerate pq-sys explicitly, it could just indicate that "for each RPATH in TRANSITIVE_RPATHS, pass 'em to cargo:rustc-link-arg=-Wl,-R...".
Dependencies could change which "*-sys" crates they're using, and the top-level targets wouldn't need to be modified (explicitly).

This would basically change the framing of the problem to: "when you build a top-level target, you see all the rpaths recommended by your dependencies. What do you wanna do with 'em?"

With this setup, there would be the option to customize behavior arbitrarily, but also the potential for really concise semantic sugar to say "... yeah, give them all to the linker".

davepacheco · 2021-09-15T23:36:50Z

I think the basic idea makes sense. It's the details I'm not sure about: exactly what information do the *-sys crates expose, how do they get that information, how do they expose it, how do top level crates see that, and how do top level crates use it to control the final RPATH?

I still think it's essential that we articulate the two different use cases. Correct me if this is wrong, but I believe the one you're describing is: "my build machine matches my production machine, and I don't want to the top level crate to have to think about the RPATHs for native libraries that are pulled in by transitive dependencies." I think that's a common goal and important to support.

I'd like to have a concrete example of what I'll call the "release engineering" case. A simple way to package up Nexus might be to deliver /opt/oxide/bin/nexus with an RPATH pointing to /opt/ooce/pgsql-13/lib/amd64 so it can find libpq. That's easy, though. More interesting would be if we wanted to avoid depending on /opt/ooce and instead build an artifact that contains:

/opt/oxide/bin/nexus (RPATH: $ORIGIN/../lib)
/opt/oxide/lib/libpq.so

Now suppose some Nexus saga wants to use pg_dump (a PostgreSQL command built on libpq), and for whatever reason it needs a newer (or older, or just different) build of libpq. We might want to ship:

/opt/oxide/bin/nexus (RPATH: $ORIGIN/../lib/nexus)
/opt/oxide/bin/pg_dump (RPATH: $ORIGIN/../lib/pg_dump)
/opt/oxide/lib/nexus/libpq.so
/opt/oxide/lib/pg_dump/libpq.so

(openssl might be a better example than libpq.)

Is that the release engineering case? Is that too contrived?

smklein · 2021-09-16T16:01:45Z

I think that's a super reasonable case - and while I want to make the "host = target" case easy, anything proposed should definitely be able to support that case of "the target machine may not look like my own".

This is super handwavey, but I think the case you mentioned could basically still be handled effectively if the rpath information emitted by "*-sys" crates was more structured.

As an example, suppose the "*-sys" crate could emit:

"This is a native library I need (by name, not path)"
"This is the path to the native library (on your host machine)"

At a top-level, a resolver in the build.rs file could be used to identify "when you're building this binary, you need libpq.so. Here's where you should look for that path on your target machine". This "resolver" would be bespoke for whatever target environment we're talking about - as an example, this could be where, for helios, we say "look in /opt/oxide/wherever" - but presumably that mapping of "libpq.so -> path on target to libpq.so" needs to happen somewhere.

Regarding "how does it get exposed up", I dunno. The rust book section on native libraries suggests the "key-value pairs" are basically the go-to.

Also, regarding "*-sys packages", the book mentions the following:

The set of *-sys packages provides a common set of dependencies for linking to native libraries. There are a number of benefits earned from having this convention of native-library-related packages:

Common dependencies on foo-sys alleviates the rule about one package per value of links.

Other -sys packages can take advantage of the DEP_NAME_KEY=value environment variables to better integrate with other packages. See the "Using another sys crate" example.

A common dependency allows centralizing logic on discovering libfoo itself (or building it from source).

These dependencies are easily overridable.

The language here implies - to me, at least - that the goal of the "*-sys" packages is to provide a single, centralized location regarding "how do we find the underlying native library" - or in their words:

centralizing logic on discovering libfoo itself (or building it from source)

davepacheco added 5 commits September 3, 2021 17:15

build fails due to missing libpq (#213)

10f12e2

pick specific nightly and update README

e289a6c

WIP: #213

391f7ee

update comment on pq-sys in Cargo.toml

2b016ff

update Cargo.lock

b3eb058

davepacheco mentioned this pull request Sep 9, 2021

build fails due to missing libpq #213

Closed

davepacheco added 2 commits September 10, 2021 15:05

lock to pq-sys commit rather than oxide branch so we can do other thi…

16c9950

…ngs with that branch

fix pq-sys reference

a4ef242

This was referenced Sep 10, 2021

statically link libpq (fixes #213) #233

Merged

statically link libpq (fixes #213) #234

Closed

davepacheco added 7 commits September 14, 2021 15:56

factor out build script logic and document all this

00dd5dd

skip MacOS

919bfea

silence clippy

d652ddc

update rust-toolchain comment

af2c3dd

Merge branch 'main' into issue-213-2

f378594

update Cargo.lock for merge

53242ab

markdown is not asciidoc

928c965

davepacheco commented Sep 14, 2021

View reviewed changes

davepacheco requested review from smklein and luqmana September 15, 2021 00:07

change pq-sys branch

38ffe99

smklein approved these changes Sep 15, 2021

View reviewed changes

davepacheco mentioned this pull request Sep 15, 2021

clean up build warnings #244

Merged

luqmana approved these changes Sep 15, 2021

View reviewed changes

davepacheco marked this pull request as ready for review September 15, 2021 22:43

davepacheco merged commit e34626f into main Sep 15, 2021

davepacheco deleted the issue-213-2 branch September 15, 2021 22:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

specify RPATH when linking to libpq #225

specify RPATH when linking to libpq #225

davepacheco commented Sep 8, 2021 •

edited

Loading

davepacheco Sep 14, 2021 •

edited

Loading

bnaecker Sep 15, 2021

davepacheco Sep 15, 2021

davepacheco commented Sep 15, 2021

davepacheco commented Sep 15, 2021 •

edited

Loading

bnaecker commented Sep 15, 2021 via email

smklein left a comment

smklein Sep 15, 2021

smklein Sep 15, 2021

luqmana left a comment

davepacheco commented Sep 15, 2021

smklein commented Sep 15, 2021

davepacheco commented Sep 15, 2021

smklein commented Sep 16, 2021

specify RPATH when linking to libpq #225

specify RPATH when linking to libpq #225

Conversation

davepacheco commented Sep 8, 2021 • edited Loading

davepacheco Sep 14, 2021 • edited Loading

Choose a reason for hiding this comment

bnaecker Sep 15, 2021

Choose a reason for hiding this comment

davepacheco Sep 15, 2021

Choose a reason for hiding this comment

davepacheco commented Sep 15, 2021

davepacheco commented Sep 15, 2021 • edited Loading

bnaecker commented Sep 15, 2021 via email

smklein left a comment

Choose a reason for hiding this comment

smklein Sep 15, 2021

Choose a reason for hiding this comment

smklein Sep 15, 2021

Choose a reason for hiding this comment

luqmana left a comment

Choose a reason for hiding this comment

davepacheco commented Sep 15, 2021

smklein commented Sep 15, 2021

davepacheco commented Sep 15, 2021

smklein commented Sep 16, 2021

davepacheco commented Sep 8, 2021 •

edited

Loading

davepacheco Sep 14, 2021 •

edited

Loading

davepacheco commented Sep 15, 2021 •

edited

Loading