-
Notifications
You must be signed in to change notification settings - Fork 428
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
run exports incorrectly applied for packages with builds with different exports #3617
Comments
This would also occur if the packages in different subdirs (e.g. osx-64 and linux-64) had different run_exports as |
I see two options for fixing this.
My preference would be for the first option. Keeping run export information in two locations breaks the single responsibility principle and requires care that the two locations contain the same information. Using the information from the packages slows down rendering but during a build the packages will be available (they are installed into the environment after all) so it seem silly not to use the information they contain. A compromise would be to add a configuration option that would switch to using channeldata. This may be useful when the speed of rendering a recipe is more important than the correctness. |
#3603 implements option 1 |
I don't think it's a big deal to have run_exports in both packages and channeldata.json. Unlike index data, we don't hotfix this. I don't think subdir is necessary - the build strings or the This said, I'm OK with the first choice, but only if we can come up with a way of caching the graph so that rendering can still be fast. We can perhaps explore using libcfgraph or something. I'm not picky on any particular solution, but we have to scale up our builds. This change represents a roughly 20% speedup in rendering (variable with the number of packages involved and also whether or not said packages are cached). That's not much on a single build, but it adds up, especially for large trees like our proposed R buildout. |
What's the slow part? I don't see why, once using .conda packages extraction of this data should be fast. You need your dependencies when building anyway. Ok when only using conda render with no intention of building it'll be slowerbbut I don't think that's the common case? Overall I don't mind which way we go. |
reading one big JSON file is still miles faster than looping through many smaller ones. Rendering is core to our ability to do big things right now. If the need for rendering is obviated somehow (like caching a graph from post-build data), we can drop the channeldata scheme. |
This may come up when conda-smithy re-renders a feedstock. So yes could be common. |
Frankly, conda-smithy rerenders aren't that bad atm (mostly since they tend to be fairly tightly scoped). Being correct is FAR more important imo. |
+:100: I can't agree with this enough! If there is ever any tension between performance and correctness, correctness has to win. If conda doesn't do the right thing it is literally unusable as it will create broken environments. |
#3621 drops the default behavior back to downloading packages to determine |
Having the build-related metadata separately available outside of the package archives surely makes sense, conceptually and performance-wise. Meaning, if gh-3603 restores correctness, then please merge that PR. N.B.: I don't know what
I'm confused. How would you be able to use the |
Channeldata.json was something that @kalefranz brought in a long, long time ago, when he refactored conda-build's indexing. It wasn't really used for anything. I started to use it for run_exports as an optimization.
The idea of channeldata.json was to aggregate central stuff from all the packages in one relatively condensed file. Maybe it can still serve this purpose with deeper indexes for run_exports. I'm ambivalent. |
Actually that’s not true. The history actually predates me. It goes back to Ilan’s metadata.json file that was used by Black Duck. For many reasons it’s useful to have some form of high level index of the channel, not just the individual subdirs (i.e. repodata.json). The metadata.json file format has some structural issues, so when Gonzalo wanted to start using it for Navigator, we worked together on the improved format. The file is now used by Navigator. It’s also likely used by VSCode at this point, since they had a request in a while back to give conda the ability to interact with channel-wide information rather than just what’s in repodata. My guess is that pycharm might want to do something similar. |
What would it mean to exclude build number? run_exports can change from one build to the next, so it seems like run_exports has to be keyed by the full package id, including build number. Take the not-entirely-realistic-but-valid meta.yaml: package:
name: has-exports
build:
number: 1
run_exports:
- {{ pin_compatible('libsodium', max_pin='x.x.x') }}
requirements:
build:
- {{ compiler('c') }}
host:
- libsodium
The downstream package should pull in build 1, since it's incompatible with build 2 due to version pinning. If it gets run_exports from build 2, So it seems like run_exports must include the build number, or the hash inputs need to be expanded so these two builds wouldn't share the same hash. |
Also ran into this recently. |
@minrk I agree, run_exports can change between builds, hashes, variants and likely some other differences none of us have considered. When building a conda package the run_export information of the packages in the build and host environment will always available from the packages themselves, so there is not reason not to use these. When rendering but not building a recipe there are performance improvements if the packages themselves do not need to be downloaded. This information is available in For example, at Anaconda have some workflows which determine package build orders by rendering recipes. For this use case using channeldata is fine, the majority (maybe all?) of the packages in defaults have consistent A PR adding the necessary information to channeldata to account for builds, hashes, etc would be welcome but is not a priority for the Anaconda team. |
Hi there, thank you for your contribution! This issue has been automatically marked as stale because it has not had recent activity. It will be closed automatically if no further activity occurs. If you would like this issue to remain open please:
NOTE: If this issue was closed prematurely, please leave a comment. Thanks! |
Could we please reopen this? |
Pulling this issues, reported by @minrk, out of #3610 since the original issue reported there has been address. Specifically the comments concerning run_exports with fftw.
If a channel contains packages with the same version number but different hashes or builds that express different run_exports, only one of these exports is captured in the
channeldata.json
file. The result is that the run_exports are incorrectly for the packages(s) which are not captured in the channeldata.json file. The slightly modified example from what @minrk gave in #3610 shows this behavior:meta.yaml
Notice that the
mpi_mpich
from channeldata.json is being applied even though thenompi
package is in the host environment.With conda 3.17.8 the rendering uses the run_export information contained within the package itself:
The text was updated successfully, but these errors were encountered: