Package binary configurations and CPS file location proposal #58

memsharded · 2024-03-27T19:10:45Z

memsharded
Mar 27, 2024

Proposal

It should be possible to tell the build systems exactly what CPS file is used for any given dependency, or for all of them.

In addition to the package search mechanism described in the current spec (https://cps-org.github.io/cps/searching.html), it should also be possible to tell the build system exactly where to find the .cps files - without relying on a filesystem search, if the user so chooses (or if a package manager or other tool is able to unequivocally generate the paths to all required dependencies).

As the transitive dependencies are not directly referenced by the build systems, it is not possible to rely on expliciting the paths in the build systems, like some load_cps_file(/full/path/to/foo.cps), but rather a mapping of dependencies to the exact location of the cps on disk should be allowed if the dependencies locations are known by the user or another tool like a package manager is able to provide.

Conceptually something like:

	foo => /path/to/mydeps/foo.cps
        bar => /path/to/systemdeps/bar.cps

As a corollary for multi-configuration systems:

For multi-configuration build systems like Visual Studio, Xcode, Ninja-MultiConfig, in which more than 1 configuration can be configured in a single pass, it should be possible to tell the build systems exactly what CPS file is used for any given dependency, or for all of them, for every configuration.

In practice this will translate to being able to provide a different mapping for each one of the configurations.

The tools shouldn’t necessarily impose packages to provide all configurations or that all configurations must exist within the same prefix.

As implementation hint, the proposal could be implemented by one “cps-map.json” file that contains the configurations and the locations:

{
    "Debug": {
        "zlib": "/path/to/debug/zlib.cps",
        "boost": "/path/to/debug/boost.cps"
    },
    "Release": {
        "zlib": "/path/to/any/other/arbitrary/zlib.cps",
        "boost": "/some/place/boost.cps"
    },
    "DebugAsan": {
        "zlib": "/path/to/asan/zlib.cps",
        "boost": "/path/to/asan/boost.cps"
    }
}

Or several files, one per configuration:

cps-map-Debug.json

{
    "zlib": "/path/to/debug/zlib.cps",
    "boost": "/path/to/debug/boost.cps"
}

cps-map-Release.json

{
    "zlib": "/path/to/any/other/arbitrary/zlib.cps",
    "boost": "/some/place/boost.cps"
}

cps-map-DebugAsan.json

{
    "zlib": "/path/to/asan/zlib.cps",
    "boost": "/path/to/asan/boost.cps"
}

It is perfectly valid and possible to have build systems and tools to use automatic strategies for locating CPS files, the current proposal fits without problems in this strategy, just the automatic search functionality could be skipped when an entry already exists in the mapping. This proposal also recommends that tools that automatically search for CPS files, output this file as a result for improved debuggability and user experience.

Current CPS status and proposed functionality

The current CPS status assume in several places that different binary configurations will always be available in the same tree. This would be a blocker for many different situations described in the following section rationale.

The sample CPS file in https://cps-org.github.io/cps/sample.html, contains:

"configurations": [ "optimized", "debug" ],
  "default_components": [ "sample" ],
  "components": {
    "sample-shared": {
      "type": "dylib",
      "configurations": {
        "optimized": {
          "location": "@prefix@/lib64/libsample.so.1.2.0"
        },
        "debug": {
          "location": "@prefix@/lib64/libsample_d.so.1.2.0"
        }
      }
    },

And assuming this is a “sample.cps” file, the strategy of locating and using the “sample.cps” assumes that this will be in one location and a subtree.

Furthermore, the https://cps-org.github.io/cps/configurations.html#configuration-merging section describes how different configurations must be merged and a convention of names like name:*.cps should be followed.

The current proposal means that the following structure must also be supported:

/path/to/myoptimize/sample.cps

"configurations": [ "optimized"],
  "default_components": [ "sample" ],
  "components": {
    "sample-shared": {
      "type": "dylib",
      "configurations": {
        "optimized": {
          "location": "@prefix@/lib64/libsample.so.1.2.0"
        },
      }
    },

/other/path/to/sample.cps

"configurations": [ "debug"],
  "default_components": [ "sample" ],
  "components": {
    "sample-shared": {
      "type": "dylib",
      "configurations": {
        "debug": {
          "location": "@prefix@/lib64/libsample.so.1.2.0"
        },
      }
    },

And then, build systems could use these specific files, without any kind of automatic finding, with an explicit mapping like:

cps-map.json

{
    "optimized": {
        "sample": "/path/to/myoptimize/sample.cps"
    },
    "debug": {
        "sample": "/other/path/to/sample.cps"
    }
}

Note that the 2 different folders do not necessarily have to be in the same tree.

Rationale

Let's consider a team creating a “foo” package, and they build a Linux, static library, in “Release” mode, and the package has no dependencies. The build or the team produces a foo.cps, and they package everything in a foo.tgz, compute its “foo.sha256” checksum and upload it to their servers, to reuse it while building other applications that depend on “foo”, even creating manifests (SBOMs) for their applications, tracing the dependency “foo” including its checksum.

Now the same team wants to introduce a sanitized build. Some sanitizers instrument the built artifacts, producing different binaries, and even some sanitizers strongly recommend or mandate that all the dependencies should be built with the same sanitizing flags. So the team proceeds to do a sanitized build of the “foo” library, which they do in a different folder, and they end up creating a new foo-san.tgz, that also includes its own internal foo.cps. Now they want to be able to build their applications against this sanitized “foo” binary, so they should be able to explicitly point the build system to the sanitized foo.cps file.

This example with sanitizers introduces the challenge of managing binary variants. The sanitizers themselves have variability and different combinations of sanitizers will result in different and often incompatible binaries, but this issue goes beyond the sanitizer example.

We can summarize these challenges:

It is impossible to foresee and model all the possible variants and use cases to define a uniform foo-xxx.cps naming scheme that models such variability. This applies both for a file name or a folder hierarchy with folders representing segments of the binary variability.
It is extremely inefficient to have to evict a previous binary configuration to replace it with a new one, overwriting it. It should be perfectly possible to have the unzipped “foo” package and the unzipped “foo-san” package binaries, living side by side in different independent folders, without even sharing the same hierarchy, so they can be both used independently without evicting or overwriting the other one.
The same rationale applies to having different versions of the same package, which can be used by different branches or different consumer projects. It is not possible to make the different versions share the same space on disk, and it is also challenging to impose a naming convention to represent any possible version (or the lack of)
It is impossible to rely on having to merge all the different configurations in the same directory tree, as it is not possible to foresee all the cases, and things like the static library should be named something like “foo-release-sanitized-x86.a” in advance, to be able to merge it lately without conflict in the “lib” folder.

One important case where such a mapping will be more relevant and necessary is cross-building. When cross-building, the selection of dependencies depends on the “host” configuration (the one that will run the final binary), not the “build” (the current build machine), and this “host” configuration has a huge variability, that can depend on multiple factors, including low level hardware boards details. Modeling and trying to resolve automatically the dependencies for this configuration can be daunting, and it is already one of the current challenges with existing dependencies' location strategies. But this is resolved trivially if users can explicitly define the location of the CPS files they want to use for their build.

There are other dimensions of binary variability beyond the current compilation options, and there are many developers that need to support them. One common example is having one shared library that is being released as final product to customers, and this shared library might be built with exactly the same compiler, architecture, build-type, compiler flags, in the exact same machine from the exact same code, but with different dependencies. Something as trivial as a dependency getting a performance or security update. Users will need to be able to have 2 different binaries of their shared library with all possible inputs exactly the same, just with different dependencies, and develop and test them in parallel. This use case is also almost impossible to model without an explicit mapping.

Finally the “build-type” or “configuration” dimension also falls in this category. Even if some users could be creating and releasing packages that contains more than one configuration, like the Release and Debug one, this approach cannot be generalized:

It is impossible to force users to build all configurations (including RelWithDebInfo, MinSizeRel, etc) a priori
There are users in which distributing together Debug and Release artifacts is a total blocker. Issues like security or IP forces them to manage, store and distribute Debug build artifacts separately to avoid any risk of leakage to final users.
The variability of configurations is also a continuum, not a finite set of configurations, and different fine grain optimization, debugging symbols, instrumentation, etc are frequently tuned.
As we have seen, it is not always possible to extend a posteriori existing packages with new build configurations, as that can destroy the reproducibility and traceability or produce artifact collisions in disk.

Advantages

The current proposal brings many advantages, without necessarily impeding or forbidding other currently existing or future approaches:

It scales very simply to any number of different binary configurations.
It scales without any a priori knowledge or consensus about the possible binary configuration variants
It avoids all possible artifact name conflicts, like having to name the library binary artifacts differently based on an infinite space of possible variants.
It is well suited for many different cases where automatic location of CPS can be challenging, like cross-build scenarios.
This approach doesn’t invalidate the approach of having tools to find and locate *.cps files in some different locations according to some logic, but rather the opposite. Such tools should be able to output a mapping file improving over the debuggability and overall UX. Hybrid approaches with users pinpointing exact locations for some dependencies and letting the tools find the others also integrate fine in this approach.
It removes a lot of responsibility from the tools, and allows users to take that responsibility and take control over that part, if that is what they want.
It is future-proof and can absorb any ecosystem mismatches in dependency names, without the need of a global registry. For example, if some third party CPS file refers to the “zlib” requirement to “zlib_pkg”, all the mapping needs to do is introduce a new entry with “zlib_pkg” pointing to the same CPS file as the “zlib” entry.
It avoids the need to restrict the model to a limited number of standard configurations, and easily allows consuming tools to use different versions, flavors, binary variants, binaries built with different transitive dependencies, etc.

mwoehlke · 2024-03-27T22:07:28Z

mwoehlke
Mar 27, 2024
Maintainer

It should be possible to tell the build systems exactly what CPS file is used for any given dependency, or for all of them.

Yes, that has always been the intent. But that's a function of the build system that doesn't need to be (and, I would argue, shouldn't be) specified by CPS. The search mechanism is what to do when you haven't been given specific instructions.

As the transitive dependencies are not directly referenced by the build systems...

...and IIUC you want the build tool to be able to override those, also? Sure, agreed, but see previous comment. (In this case, note that hints happen in between that and the default search paths.)

it should be possible to tell the build systems exactly what CPS file is used [...] for every configuration

Um. Okay, I can imagine cases where that might be useful, but I think you'll find it difficult getting CMake to implement it. I am opposed to suggesting that this is a requirement.

As implementation hint, the proposal could be implemented by one “cps-map.json” file

Perhaps, but this seems orthogonal to the core specification. I don't know if CMake would use it, since we already have our own mechanism.

the automatic search functionality could be skipped when an entry already exists in the mapping

If by "could be" you mean tools should try the specified path first, that's obviously how this would work. If you mean a package not found there should be a hard error... IMHO that should be up to the tool to decide. I have no problem if tools want to make that an optional behavior. I'm less convinced falling back to search shouldn't at least be an option, or that it shouldn't be the default option (e.g. as it is currently in CMake).

In summary: you are free to write a build tool that works this way. I see nothing in CPS that stops you from doing so. However, I suspect it would be extremely challenging for at least some build tools (e.g. CMake) to support what you are proposing.

1 reply

bretbrownjr Mar 28, 2024
Collaborator

I think you'll find it difficult getting CMake to implement it.

For what it's worth, it's trivial to implement. A CMake module that loads N different CPS files from absolute paths serves the same purpose.

And I guess one could override all find_package calls to be no-ops in a given workflow and then one could check for link libraries that aren't targets in a DEFER context to force errors for missing dependencies.

And I would expect the above to typically outperform the usual CMake find_package equivalent behavior.

(Guess how I know all this!)

mwoehlke · 2024-03-28T14:47:15Z

mwoehlke
Mar 28, 2024
Maintainer

it's trivial to implement

Do you have your own version of find_package that understands foo_DIR_<config>? How are you handling the necessary name-mangling of imported targets? Do you have helpers to generate the generator expressions necessary for per-config linking? It might be do-able, but I very much doubt it's drop-in with CMake's built-in functionality.

1 reply

bretbrownjr Mar 28, 2024
Collaborator

I'm going to use "CPS" in my descriptions here, but we are actually using *.pc files right now.

That's a lot of questions to answer all at once. Short facts that would clear things up for you a bit, at least:

Most of the complexities you're worried about are the reason dependency mapping is interesting.
Modelling CMake module identities and execution has to be separate from CPS identities
I recommend using target names to model CPS identities, though there are probably other ways to do it
If a CPS dependency is already imported, don't import it again. This gives the user a lot of control.
- It's likely the CPS-capable find_package call will need to operate this way in CPS mode, at least by default.
- Aside: We mostly don't use direct find_package right now because it doesn't follow this practice.
We mostly use global imported targets.
Per config linking hasn't been an issue, though our dependency management system mostly handles providing dependencies of different build "flavors"

It's likely existing systems will eventually need to evolve and/or simplify to support full interop. I would prioritize simple support for CPS with reverse compatibility to start with, especially if we're confident we can nudge users to a new semantic with more robust features as time goes on.

bretbrownjr · 2024-03-28T16:41:37Z

bretbrownjr
Mar 28, 2024
Collaborator

On a less important bikeshedding note, I'm thinking this model for separation of concerns is helpful to me:

Action	Operates On	Produces
provide dependencies	projects & packages	CPS files (libraries)
resolve dependencies	CPS names	dependency solution (dependency map)
build	source code & artifacts & flags & commands	artifacts

That status quo is that current tools do some or all of those actions. CMake with FetchContent and Meson with wraps enabled do all of the above. Trivial CMake projects only do the last step. Conan with certain generators provides and resolves dependencies, with CMake (etc.) doing build actions from there. pkg-config and cps-config mostly support higher level build systems on the dependency resolution and build steps.

Point being, for interop and eventual convergence, I think we need dependency maps as another "seam" in the architecture. On a technical level, it could be a separate project from CPS that specifies how dependency solutions are encoded and decode, but it would be great for all us humans if we didn't have to keep two specifications collated.

2 replies

prince-chrismc Apr 1, 2024

This is an excellent summary. I think the order of operations here is important, where resolving the names comes before providing dependencies. Though it's very much an iterative process as a CPS would have more names to resolve. From the users perspective, they only want to provide a list of names (they directly need) and the tooling is responsible for finding the initial CPS files.

bretbrownjr Apr 2, 2024
Collaborator

@mwoehlke Now that I've chewed this over a bit more in my subconscious, I think clarifying that a dependency solution is to support one or more specific and "matching" link commands.

In contrast, CMake target graphs with generator expressions are more like templates (in the C++ sense) that are instantiated when a given link command is instantiated. The underlying model of a dependency solution logically exists at the moment of link command instantiation, though it's not exported for others to inspect or consume at any point, not even in the CMake introspection APIs.

I think we still need to ship something that works for existing CMake users, but having a more coherent model will be needed so that various build tools can have something coherent to build their interoperability around.

I'm also thinking if we need generator expression support as a practical matter, we should consider how CPS needs to accommodate them. I don't believe opaque "this is a string that means nothing" will really fly inside of a CPS file. Tools will want to see when a configuration named "Debug" adds security-problematic compilation flags and such.

mwoehlke · 2024-03-28T16:44:44Z

mwoehlke
Mar 28, 2024
Maintainer

If a CPS dependency is already imported, don't import it again.

How, then, if I've already imported the release Foo package, do I import the asan Foo package? This requires them to have separate identities, and it requires that different configurations of the consumer link to different targets. You can do the latter with generator expressions, but it's a lot messier than linking to different configurations of the same component/target.

A much more sane approach is to provide the release and asan builds as different configurations of the same package.

4 replies

bretbrownjr Mar 28, 2024
Collaborator

I would argue that's a different dependency solution that requires a different graph. They are not interchangeable. For instance, asan builds are not security-appropriate to link against in release builds. And falling back from asan dependencies to release dependencies will potentially break asan testing workflows. You want your dependency resolution to be specific about picking one or the other.

The CMakeLists.txt should work for both solutions, so it shouldn't be specific either way. Note that the schemas @memsharded proposes accomodates for different "build types" to use CMake terminology. I'm still unclear on how we make sure everyone spells configurations the same way, but we can talk about that in another discussion I suppose, and it's an issue in the existing CMake model besides.

For the record, I like having N dependency solutions, one for each "flavor" or "configuration" or whatever we want to call it. It is more extensible and better conveys the idea of specific and coherent graphs as opposed to sets of nodes to sift through.

prince-chrismc Apr 1, 2024

Why do you see the the specification providing the list of configurations?

As a user I would want to specify the name of a proprietary configurations (hardware code name for instance) and my expectations is the the build system (all the tools) would use the that exact on. And I'd also expect it to error since I explicitly asked for one that DNE or is invalid.

bretbrownjr Apr 2, 2024
Collaborator

From my experience with CMake build types and build configurations, describing them as arbitrary strings hasn't really served the expected function so far. That's why we still have "Debug" builds that allow PIC on or off depending on settings.

To rephrase, dependency provision and resolution actions are affected by decisions like whether position independent linking will be supported, so the dependency provider needs to be able to express whether a given CPS package is suitable for that purpose. I don't know how to name a "configuration" to provide that information. I do know how to provide a dependency solution (dependency map) that includes only dependencies fit to that task, though.

bretbrownjr Apr 2, 2024
Collaborator

To elaborate a bit more, I could see some well-defined properties per CPS component clarifying what it is and is not suitable for. I don't know that I am proposing exactly that right now. I think treating dependency solutions as first-class concepts in our docs and guides if not in the CPS spec will help clarify what to do with the fact that CPS doesn't have a way to precisely describe the capabilities to the full needs of a dependency resolution workflow. The status quo would be some instructions to be careful and make sure all projects sync up on expectations. I'm not satisfied with that as a long-term goal, at least.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Package binary configurations and CPS file location proposal #58

{{title}}

Replies: 4 comments 8 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Package binary configurations and CPS file location proposal #58

memsharded Mar 27, 2024

Proposal

Current CPS status and proposed functionality

Rationale

Advantages

Replies: 4 comments · 8 replies

mwoehlke Mar 27, 2024 Maintainer

bretbrownjr Mar 28, 2024 Collaborator

mwoehlke Mar 28, 2024 Maintainer

bretbrownjr Mar 28, 2024 Collaborator

bretbrownjr Mar 28, 2024 Collaborator

prince-chrismc Apr 1, 2024

bretbrownjr Apr 2, 2024 Collaborator

mwoehlke Mar 28, 2024 Maintainer

bretbrownjr Mar 28, 2024 Collaborator

prince-chrismc Apr 1, 2024

bretbrownjr Apr 2, 2024 Collaborator

bretbrownjr Apr 2, 2024 Collaborator

memsharded
Mar 27, 2024

Replies: 4 comments 8 replies

mwoehlke
Mar 27, 2024
Maintainer

bretbrownjr Mar 28, 2024
Collaborator

mwoehlke
Mar 28, 2024
Maintainer

bretbrownjr Mar 28, 2024
Collaborator

bretbrownjr
Mar 28, 2024
Collaborator

bretbrownjr Apr 2, 2024
Collaborator

mwoehlke
Mar 28, 2024
Maintainer

bretbrownjr Mar 28, 2024
Collaborator

bretbrownjr Apr 2, 2024
Collaborator

bretbrownjr Apr 2, 2024
Collaborator