run exports incorrectly applied for packages with builds with different exports #3617

jjhelmus · 2019-07-18T20:33:12Z

Pulling this issues, reported by @minrk, out of #3610 since the original issue reported there has been address. Specifically the comments concerning run_exports with fftw.

If a channel contains packages with the same version number but different hashes or builds that express different run_exports, only one of these exports is captured in the channeldata.json file. The result is that the run_exports are incorrectly for the packages(s) which are not captured in the channeldata.json file. The slightly modified example from what @minrk gave in #3610 shows this behavior:

meta.yaml

package:
  name: testing
  version: 1.0

requirements:
  host:
    - fftw 3.3.8 nompi_h7f3a6c3_1106

$ conda render --channel https://conda-static.anaconda.org/conda-forge .
--------------
Hash contents:
--------------
{}
----------
meta.yaml:
----------
package:
    name: testing
    version: '1.0'
requirements:
    host:
        - _libgcc_mutex 0.1 main
        - libgfortran-ng 7.3.0 hdf63c60_0
        - libgcc-ng 9.1.0 hdf63c60_0
        - fftw 3.3.8 nompi_h7f3a6c3_1106
    run:
        - fftw * mpi_mpich_*
extra:
    copy_test_source_files: true
    final: true

Notice that the mpi_mpich from channeldata.json is being applied even though the nompi package is in the host environment.

With conda 3.17.8 the rendering uses the run_export information contained within the package itself:

conda render --channel https://conda-static.anaconda.org/conda-forge test/
--------------
Hash contents:
--------------
{}
----------
meta.yaml:
----------
package:
    name: testing
    version: '1.0'
requirements:
    host:
        - _libgcc_mutex 0.1 main
        - libgfortran-ng 7.3.0 hdf63c60_0
        - libgcc-ng 9.1.0 hdf63c60_0
        - fftw 3.3.8 nompi_h7f3a6c3_1106
extra:
    copy_test_source_files: true
    final: true

The text was updated successfully, but these errors were encountered:

jjhelmus · 2019-07-18T20:38:20Z

This would also occur if the packages in different subdirs (e.g. osx-64 and linux-64) had different run_exports as channeldata.json only considers the version of the package not the subdir.

jjhelmus · 2019-07-18T20:48:49Z

I see two options for fixing this.

Always use the run_export information from the packages. This requires that all packages in the build and host environments be downloaded when a recipe is rendered which can be slow.
Expand the run_export information in channeldata.json to include subdir, version and build_string information.

My preference would be for the first option. Keeping run export information in two locations breaks the single responsibility principle and requires care that the two locations contain the same information. Using the information from the packages slows down rendering but during a build the packages will be available (they are installed into the environment after all) so it seem silly not to use the information they contain.

A compromise would be to add a configuration option that would switch to using channeldata. This may be useful when the speed of rendering a recipe is more important than the correctness.

jjhelmus · 2019-07-18T20:49:44Z

#3603 implements option 1

msarahan · 2019-07-18T22:10:12Z

I don't think it's a big deal to have run_exports in both packages and channeldata.json. Unlike index data, we don't hotfix this. I don't think subdir is necessary - the build strings or the hash_input.json data will likely include platform-specific differentiators, such as compiler, that will imply platform.

This said, I'm OK with the first choice, but only if we can come up with a way of caching the graph so that rendering can still be fast. We can perhaps explore using libcfgraph or something. I'm not picky on any particular solution, but we have to scale up our builds. This change represents a roughly 20% speedup in rendering (variable with the number of packages involved and also whether or not said packages are cached). That's not much on a single build, but it adds up, especially for large trees like our proposed R buildout.

mingwandroid · 2019-07-18T22:14:17Z

What's the slow part? I don't see why, once using .conda packages extraction of this data should be fast. You need your dependencies when building anyway. Ok when only using conda render with no intention of building it'll be slowerbbut I don't think that's the common case?

Overall I don't mind which way we go.

msarahan · 2019-07-18T22:21:09Z

reading one big JSON file is still miles faster than looping through many smaller ones. Rendering is core to our ability to do big things right now. If the need for rendering is obviated somehow (like caching a graph from post-build data), we can drop the channeldata scheme.

jakirkham · 2019-07-18T22:21:36Z

Ok when only using conda render with no intention of building it'll be slowerbbut I don't think that's the common case?

This may come up when conda-smithy re-renders a feedstock. So yes could be common.

mariusvniekerk · 2019-07-19T00:57:35Z

Frankly, conda-smithy rerenders aren't that bad atm (mostly since they tend to be fairly tightly scoped). Being correct is FAR more important imo. libcfgraph is basically just a post package upload metadata extraction, which should be easier to do with .conda.

dhirschfeld · 2019-07-19T01:12:19Z

Being correct is FAR more important imo

+:100: I can't agree with this enough!

If there is ever any tension between performance and correctness, correctness has to win.

If conda doesn't do the right thing it is literally unusable as it will create broken environments.

jjhelmus · 2019-07-19T19:20:11Z

#3621 drops the default behavior back to downloading packages to determine run_exports while allowing an opt-in use of channeldata using the --use-channeldata command line argument. Work is still needed to support variants in channeldata.json but the default behavior would correctly determine run_exports.

mbargull · 2019-07-22T21:12:54Z

Having the build-related metadata separately available outside of the package archives surely makes sense, conceptually and performance-wise.
But if it's inconsistent with the source data, we really shouldn't use it until it is.

Meaning, if gh-3603 restores correctness, then please merge that PR.
When channeldata.json gets (more?) consistent with the actual metadata, a configuration option might make sense. Until then, IDK, as we know something is not right, such an option would be equivalent to --maybe-do-incorrect-stuff, i.e., it's not of much use...

N.B.: I don't know what channeldata.json contains or how it's used. (Is its use for run_exports new in 3.18? I'm a little out of the loop...)

Expand the run_export information in channeldata.json to include subdir, version and build_string information.

I don't think subdir is necessary - the build strings or the hash_input.json data will likely include platform-specific differentiators, such as compiler, that will imply platform.

I'm confused. How would you be able to use the run_exports information correctly if you don't have the full build information (channel, subdir, version, (build number,) build string) available? Unless you impose restrictions on how/when the packager is able to apply/change run_exports in a build's meta.yaml, this can't work. (Imposing necessary restriction -- what ever those may be -- would be a breaking change, of course.)
If channeldata.json is agnostic to subdirs and build strings and such, it cannot be a valid input for figuring out run_exports. You simply need a repodata.json-like file that is augmented with build-only information like run_exports definitions. I'm confused.

msarahan · 2019-07-22T22:43:17Z

Channeldata.json was something that @kalefranz brought in a long, long time ago, when he refactored conda-build's indexing. It wasn't really used for anything. I started to use it for run_exports as an optimization.

How would you be able to use the run_exports information correctly if you don't have the full build information (channel, subdir, version, (build number,) build string) available?

channel is implicit - which url is channeldata.json coming from?
subdir is (almost always) implicit, because almost all packages that have hashes have them because a compiler is used. Compilers are specific to platforms, by virtue of the compiler name always having the target platform as a suffix. This means the hash itself encodes subdir. This is probably practically good enough, but it is not 100% guaranteed that if you have a hash, you had a compiler. So OK, maybe we need subdir.
version: of course
build number: I hope not
build string: yeah, this one is important.

The idea of channeldata.json was to aggregate central stuff from all the packages in one relatively condensed file. Maybe it can still serve this purpose with deeper indexes for run_exports. I'm ambivalent.

kalefranz · 2019-07-22T23:43:15Z

It wasn't really used for anything.

Actually that’s not true. The history actually predates me. It goes back to Ilan’s metadata.json file that was used by Black Duck. For many reasons it’s useful to have some form of high level index of the channel, not just the individual subdirs (i.e. repodata.json). The metadata.json file format has some structural issues, so when Gonzalo wanted to start using it for Navigator, we worked together on the improved format. The file is now used by Navigator. It’s also likely used by VSCode at this point, since they had a request in a while back to give conda the ability to interact with channel-wide information rather than just what’s in repodata. My guess is that pycharm might want to do something similar.

minrk · 2019-07-25T07:39:03Z

What would it mean to exclude build number? run_exports can change from one build to the next, so it seems like run_exports has to be keyed by the full package id, including build number.

Take the not-entirely-realistic-but-valid meta.yaml:

package:
  name: has-exports
build:
  number: 1
  run_exports:
      - {{ pin_compatible('libsodium', max_pin='x.x.x') }}

requirements:
  build:
    - {{ compiler('c') }}
  host:
    - libsodium

build 1 has libsodium[>=1.0.17,<1.0.18a0]
some time later, rebuild with build 2 after there's been a libsodium release. hash input has not changed, but run_exports is now libsodium[>=1.0.18,<1.0.19a0]
build a downstream package, depending on has-exports and the previous libsodium==1.0.17

The downstream package should pull in build 1, since it's incompatible with build 2 due to version pinning. If it gets run_exports from build 2, downstream will have mutually exclusive run dependencies on exactly 1.0.17 and 1.0.18.

So it seems like run_exports must include the build number, or the hash inputs need to be expanded so these two builds wouldn't share the same hash.

jakirkham · 2019-07-27T03:58:17Z

Also ran into this recently.

jjhelmus · 2019-07-30T21:59:59Z

@minrk I agree, run_exports can change between builds, hashes, variants and likely some other differences none of us have considered.

When building a conda package the run_export information of the packages in the build and host environment will always available from the packages themselves, so there is not reason not to use these.

When rendering but not building a recipe there are performance improvements if the packages themselves do not need to be downloaded. This information is available in channeldata but is incomplete and cases where run_exports are different for different variants, builds, etc. Still if you know the channels you are using have consistent run_exports or the use cases can handle small inconsistencies in rendering, this trade off is often acceptable.

For example, at Anaconda have some workflows which determine package build orders by rendering recipes. For this use case using channeldata is fine, the majority (maybe all?) of the packages in defaults have consistent run_exports and if they are incorrect during the builds the correct run_exports from the packages themselves are used.

A PR adding the necessary information to channeldata to account for builds, hashes, etc would be welcome but is not a priority for the Anaconda team.

github-actions · 2023-02-10T04:03:03Z

Hi there, thank you for your contribution!

This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs.

If you would like this issue to remain open please:

Verify that you can still reproduce the issue at hand
Comment that the issue is still reproducible and include:
- What OS and version you reproduced the issue on
- What steps you followed to reproduce the issue

NOTE: If this issue was closed prematurely, please leave a comment.

Thanks!

jakirkham · 2023-03-15T10:16:40Z

Could we please reopen this?

jjhelmus mentioned this issue Jul 18, 2019

run_exports not applied when channel does not provide channeldata.json #3583

Closed

jjhelmus mentioned this issue Jul 18, 2019

Pull conda-build 3.18.x ? conda-forge/conda-build-feedstock#106

Closed

isuruf mentioned this issue Jul 19, 2019

bonsu: new recipe conda-forge/staged-recipes#8842

Merged

5 tasks

mbargull mentioned this issue Jul 24, 2019

Some fixes ( <-- most original/descriptive title... ) bioconda/bioconda-utils#593

Merged

jaimergp mentioned this issue Jun 25, 2021

Add run_export to pytorch conda-forge/pytorch-cpu-feedstock#43

Merged

5 tasks

kenodegard added this to 🧭 Planning May 11, 2022

kenodegard removed this from 🧭 Planning May 13, 2022

github-actions bot added the stale [bot] marked as stale due to inactivity label Feb 10, 2023

kenodegard added this to 🧭 Planning Feb 24, 2023

github-project-automation bot moved this to 🆕 New in 🧭 Planning Feb 24, 2023

github-actions bot added the stale::closed [bot] closed after being marked as stale label Mar 13, 2023

github-actions bot closed this as completed Mar 13, 2023

github-project-automation bot moved this from 🆕 New to 🏁 Done in 🧭 Planning Mar 13, 2023

github-actions bot added the locked [bot] locked due to inactivity label Mar 15, 2024

github-actions bot locked as resolved and limited conversation to collaborators Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run exports incorrectly applied for packages with builds with different exports #3617

run exports incorrectly applied for packages with builds with different exports #3617

jjhelmus commented Jul 18, 2019 •

edited

Loading

jjhelmus commented Jul 18, 2019

jjhelmus commented Jul 18, 2019

jjhelmus commented Jul 18, 2019

msarahan commented Jul 18, 2019

mingwandroid commented Jul 18, 2019

msarahan commented Jul 18, 2019

jakirkham commented Jul 18, 2019

mariusvniekerk commented Jul 19, 2019

dhirschfeld commented Jul 19, 2019

jjhelmus commented Jul 19, 2019

mbargull commented Jul 22, 2019

msarahan commented Jul 22, 2019

kalefranz commented Jul 22, 2019

minrk commented Jul 25, 2019 •

edited

Loading

jakirkham commented Jul 27, 2019

jjhelmus commented Jul 30, 2019

github-actions bot commented Feb 10, 2023

jakirkham commented Mar 15, 2023

run exports incorrectly applied for packages with builds with different exports #3617

run exports incorrectly applied for packages with builds with different exports #3617

Comments

jjhelmus commented Jul 18, 2019 • edited Loading

jjhelmus commented Jul 18, 2019

jjhelmus commented Jul 18, 2019

jjhelmus commented Jul 18, 2019

msarahan commented Jul 18, 2019

mingwandroid commented Jul 18, 2019

msarahan commented Jul 18, 2019

jakirkham commented Jul 18, 2019

mariusvniekerk commented Jul 19, 2019

dhirschfeld commented Jul 19, 2019

jjhelmus commented Jul 19, 2019

mbargull commented Jul 22, 2019

msarahan commented Jul 22, 2019

kalefranz commented Jul 22, 2019

minrk commented Jul 25, 2019 • edited Loading

jakirkham commented Jul 27, 2019

jjhelmus commented Jul 30, 2019

github-actions bot commented Feb 10, 2023

jakirkham commented Mar 15, 2023

jjhelmus commented Jul 18, 2019 •

edited

Loading

minrk commented Jul 25, 2019 •

edited

Loading