Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda Crawler Support #535

Closed
lamarrr opened this issue Dec 19, 2023 · 1 comment · Fixed by #532
Closed

Conda Crawler Support #535

lamarrr opened this issue Dec 19, 2023 · 1 comment · Fixed by #532

Comments

@lamarrr
Copy link

lamarrr commented Dec 19, 2023

Background

Conda exposes packages in a different format from other python repositories like pypi. Conda is a python environment locked to a specific python version.
Conda deals with packages locked to a specific version for a version of the channel, this ensures packages do not break due to one incompatibility or another as the packages are managed for compatibility, similar to how you'd ship a docker container.
The primary consumption point is the "packages" themselves which are accompanied with scripts to modify the environment and setup the packages and dependencies which are then consumed by the setup application, the packages may also contain DLLs, scripts, compiled python binary (.pyc), python code. etc.
The structure of conda repositories and their indexing process are described here: https://docs.conda.io/projects/conda-build/en/stable/concepts/generating-index.html

Conda has three main channels: anaconda-main, anaconda-r, and conda-forge which is more geared towards business uses

We crawl both the packages and the source code (not always specified) for the licensing metadata and other metadata about the package.

the source from which the conda packages are created from are often but not always provided via a url which links a compressed source file hosted externally, sometimes via github, or another website. note that this is a file and not a git repository.
the main conda package is hosted on the conda channels themselves and are compressed and contain necessary licensing information, compilers, environment configuration scripts, dependencies, etc. that are needed to make the package work.

The crawler uses the coordinates of the syntax:

type: conda | condasource
provider: conda-forge | anaconda-main | anaconda-r
namespace: -
name: any
revision: (((${version}|-)_(${architecture}|-))|-)
toolVersion: (${toolVersion}|-)

i.e.

conda/conda-forge/-/numpy/1.13.0_linux-aarch64/py36
condasource/conda-forge/-/numpy/_
conda/conda-forge/-/numpy/-/py36
conda/conda-forge/-/numpy/1.13.0_/py36
conda/conda-forge/-/numpy/_linux-aarch64/py36
conda/anaconda-main/-/numpy/_/py27
conda/anaconda-main/-/numpy/_/-

where
type (required): conda or condasource
package name: name of the package
provider (required): channel on which the package is to be crawled from. conda-forge, anaconda-main, or anaconda-r
revision (optional): package version and architecture i.e. 0.3.0_win64. if it is a conda coordinate type and no architecture is specified any architecture is chosen. condasource type packages don't need the architecture revision tag as they are not architecture specific
toolversion (optional): the build version of the package, this is usually a conda-specific representation of the build tools and environment configuration, and build iteration of the package. i.e. for a python 3.9 environment, this could be py39H443E.
if none is specified, the latest one will be selected using lexicographical order.

Conda-forge is a community effort and packages are published by opening PRs on their github repository as described here https://conda-forge.org/docs/maintainer/adding_pkgs.html

@RazaAli99
Copy link

Hi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants