Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Release artifact build and import process #957

Closed
chouseknecht opened this issue Jul 25, 2018 · 23 comments
Closed

Proposal: Release artifact build and import process #957

chouseknecht opened this issue Jul 25, 2018 · 23 comments
Assignees
Milestone

Comments

@chouseknecht
Copy link
Contributor

Background

There are two models of building content: “push” and “pull”. In a “push” model, the user builds an artifact (e.g., software package, content archive, container image, etc.) locally, and pushes it to a content server. In a “pull” model, the content server downloads or pulls the source code, and builds the artifact for the user. In both models, there are defined procedures, formats, metadata, and supporting tooling to aid in producing a release artifact.

Most popular content services use a “push” model, including: PyPi (Python packages), Crates.io (Rust packages), and NPM (Node.JS packages). For these services, the content creator transforms the source code into a package artifact, and takes on the responsibility of testing, building, and pushing the artifact to the content server.

In rare cases content services take on the process of building artifacts. Docker Hub is one such example, where a content creator is able to configure an automated build process. The build process is triggered by a notification from a source code hosting service (i.e., GitHub or Bitbucket), when new code is merged. In response to the notification, Docker Hub downloads the new code, and generates a new image.

Problem Description

The Galaxy import process works as a “pull” model that can be initiated manually via the Galaxy website, or triggered automatically via a webhook from the Travis CI platform. However, unlike other content services, Galaxy does not enforce an artifact format, does not provide a specification for artifact metadata, and does not provide tooling to aid in building a release artifacts.

When it comes to versioning content, Galaxy relies on git tags stored in the source code hosting service (GitHub). These tags point to a specific commit within the source code history. Each tag represents a point in time within the source code lifecycle, and is only useful within the context of a git repository. Removing the source code from the repository and placing it in an artifact causes the git tags to be lost, and with it any notion of the content version.

Galaxy provides no concept of repository level metadata, where information such as a version number, name and namespace might be located and associated with a release artifact. Metadata is currently only defined at the content level. For example, Ansible roles contain metadata stored in a meta/main.yml file, and modules contain metadata within their source code. Combine multiple content items and types into a single release artifact, and the metadata becomes ambiguous.

The Galaxy import process does not look for a release artifact, but instead clones the GitHub repository, and inspects the local clone. This means that any notion of content version it discovers and records comes directly from git tags. It’s not able to detect when a previously recorded version of the content has been altered, nor is it able to help an end user verify that the content being downloaded is the expected content. It’s also not able to inspect and test release artifacts, and therefore can offer no assurances to the end user of the content.

Since it doesn’t interact with release artifacts, as you might expect, Galaxy offers no prescribed process and procedures for creating a release archive, nor does it offer any tooling to assist in the creation a release archive. The good news is, Galaxy is a blank canvas in this regard.

Proposed Solution

Define repository metadata and build manifest

A repository metadata file, galaxy.toml, will be placed at the root of the project directory tree, and contain information such as: author, license, name, namespace, etc. It will hold any attributes required to create a release artifact from the repository source tree.

The archive build process (defined later) will package the repository source contents (e.g., roles, modules, plugins, etc.), and generate a build manifest file. The generated manifest file will contain the metadata found in galaxy.yml, plus information about the package structure and contents, and information about the release, including the version number.

The generated manifest file will be a JSON formatted file called METADATA that will be added to the root of the release artifact during the build process. Consumers of the release artifact, such as the Galaxy CLI, and the Galaxy import process, will be able to read the manifest file, and verify information about the release and its contents.

Enable Mazer to build packages

Given a defined package structure and a process for building a release artifact, it makes since to build the necessary components into Mazer that automate the artifact build process.

Use GitHub Releases as content storage

GitHub Releases will be the mechanism for storing and sharing release archives. GitHub provides an API that can be used by CI platforms and Mazer to push release artifacts to GitHub.

Mazer will be extended with the ability to push a release artifact to GitHub. This provides a single, consistent method for content creators to automate release pushes that can be called from any CI platform.

Notify the Galaxy server when new release artifacts are available

On the Galaxy server, add the ability for users to generate an API token that can be used by clients, such as Mazer, to authenticate with the API.

Extend Mazer with the ability to trigger an import process. Mazer will authenticate with the API via a user’s API token, and trigger an import of the newly available release.

Verify release artifacts

Enable Mazer to verify the integrity of release artifacts downloaded from GitHub at the time of installation.

There are several solutions widely used for verifying the integrity of a downloaded artifact, including checksums and digital signatures. In general, a checksum guarantees integrity, but not authenticity. A digital signature guarantees both integrity and authenticity.

Using a digital signature for user content requires a complex process of maintaining a trusted keychain, and still does not guarantee perfect authenticity. Since release artifacts are not hosted by Galaxy, but rather by a third party, it’s impossible to perfectly guarantee authenticity.

However, since Galaxy is a centralized packages index, and data transfer between the Galaxy server and client is secured via TLS encryption, Galaxy can be considered a trusted source of metadata, and integrity verification can be achieved by storing release artifact checksums on the Galaxy server.

During import of a repository, Galaxy will store metadata, including the checksum, for a specific content version only once. Any subsequent updates to a version will be prohibited.

Import workflow.

  1. Using Mazer, user triggers an import of a repository, passing the URL of the new release
  2. Galaxy downloads the release artifact, calculates a checksum, and stores the checksum along with additional metadata about the release
  3. Any subsequent updates of already imported package are prohibited.

Install Workflow

  1. User executes mazer install command to install an Ansible collection
  2. Mazer downloads package metadata from Galaxy, which includes the download URL and checksum.
  3. Mazer downloads the release artifact
  4. Mazer calculates checksum of downloaded package, and compares it with checksum received from Galaxy
@tima
Copy link
Contributor

tima commented Jul 25, 2018

Good stuff. Here are my comments after my first review:

galaxy.toml

I'm in strong opposition to introducing yet another file format and dependency. The Ansible core team has standardized on YAML and JSON are are transition away from INI. (YAML is now the preferred format for configuration and inventory.) There are also too many different file formats (JSON this, YAML that, TOML over here) in this proposal already. I can only see it complicating things with no apparent value to the user or function of this system.

METADATA

This file needs an extension and is too generic a filename. Also, there are too many forms of "meta" flying around galaxy/mazer that it is getting confusing. Can we just call it what it is -- a manifest?

Mazer will be extended with the ability to push a release artifact to GitHub. This provides a single, consistent method for content creators to automate release pushes that can be called from any CI platform.

I agree with this EXCEPT it should be pushed to Galaxy rather than GitHub. Galaxy can talk to GitHub or whatever other backend storage mechanisms we decide to support. That needs to be transparent to the user and administered by Galaxy as needed.

Also should we even help publish artifacts before they've been verified as some thing Galaxy can import?

Notify the Galaxy server when new release artifacts are available

Is this really necessary if the push goes thru Galaxy? Galaxy can inspect it and collect all of that info and then push the artifact to the Github backend or whatever.

  1. Any subsequent updates of already imported package are prohibited.
  1. Mazer downloads the release artifact

I'd like to see more in this proposal on how "subsequent updates" will be handled and resolved since Galaxy will be entirely dependent on an external system that does not have the constraints.

For example, I ask for foo.apache-simple v2.0.1 because I tested and depend on it. foo unwisely/accidentally changes the artifact in their github account. What does galaxy/mazer serve me? How is that handled and resolved? What if they remove the release artifact entirely? Things of this nature should be considered.

@alikins
Copy link
Contributor

alikins commented Jul 31, 2018

galaxy.toml

I'm in strong opposition to introducing yet another file format and dependency. The Ansible core team has standardized on YAML and JSON are are transition away from INI. (YAML is now the preferred format for configuration and inventory.) There are also too many different file formats (JSON this, YAML that, TOML over here) in this proposal already. I can only see it complicating things with no apparent value to the user or function of this system.

Tend to agree about file format proliferation in general.

Though for the manifest, it could be useful to create it in a format with a canonical representation. That could make it simpler to externally verify the validity of a manifest. If a tool could reproduce the manifest from the artifact contents bit perfectly, that could make the manifest more powerful. Hard to do with yaml/json/toml though.

Mazer will be extended with the ability to push a release artifact to GitHub. This provides a single, consistent method for content creators to automate release pushes that can be called from any CI platform

I agree with this EXCEPT it should be pushed to Galaxy rather than GitHub. Galaxy can talk to GitHub or whatever other backend storage mechanisms we decide to support. That needs to be transparent to the user and administered by Galaxy as needed.

Is this something supported by the github auth scheme?

The plus would be that in theory galaxy could create a slightly stronger link between the "source" and the built "release" (at least if you trust galaxy).

One downside is that then galaxy becomes a single point of failure. If galaxy is compromised, then potentially all published releases could be compromised.

@alikins
Copy link
Contributor

alikins commented Jul 31, 2018

Overall proposal sounds good to me.

@alikins
Copy link
Contributor

alikins commented Jul 31, 2018

One related issue that has had some discussion about is how mazer/ansible can use galaxy content under development. Essentially, if ansible can use mazer style content directly out of a git working tree.

@chouseknecht
Copy link
Contributor Author

chouseknecht commented Aug 1, 2018

File types and names

I think we're all in agreement on the format of the repository level metadata and manifest files. We'll stick with YAML and JSON. YAML for files that humans edit/maintain. JSON for anything generated.

Agree on naming of the generated Manifest file too. Calling it METADATA is confusing. As @tima suggested, let's call it what is is: manifest.json.

Pushing to GitHub vs Galaxy

I agree with this EXCEPT it should be pushed to Galaxy rather than GitHub. Galaxy can talk to GitHub or whatever other backend storage mechanisms we decide to support. That needs to be transparent to the user and administered by Galaxy as needed.

Also should we even help publish artifacts before they've been verified as some thing Galaxy can import?

I like this idea, pushing to Galaxy, rather than GitHub. Travis CI takes this approach, kind of, if you look at their docs on GitHub release pushing.

Travis requires the user to provide an OAuth token that has a scope of either public_repo or repo. For community users, where repositories are public, public_repo works, and Galaxy already requires this scope when a users authenticates with the Galaxy web site. Travis then uses the token to contact the GitHub API and perform the push.

Different from Travis, Galaxy could actually inspect the artifact, and perform static analysis on it, prior to pushing to GitHub. We just need to be very clear on what our criteria is for anlyzing/testing an artifact. We may even want to give the user the ability via Mazer to analyze and test the artifact prior to handing it off to Galaxy.

Once the artifact passes through Galaxy's static analysis/testing, the Galaxy server can use the OAuth token to push it to GitHub. I think the pushing process also needs to live in Mazer so that it can be run within the context of the CI session (Travis or otherwise), and the user can see immediate status feedback. It's import that the user knows whether or not the push succeeded, so we can't simply make a web hook API call to the Galaxy server and hope for the best.

@cutwater
Copy link
Collaborator

cutwater commented Aug 1, 2018

I'm in strong opposition to introducing yet another file format and dependency. The Ansible core team has standardized on YAML and JSON are are transition away from INI. (YAML is now the preferred format for configuration and inventory.) There are also too many different file formats (JSON this, YAML that, TOML over here) in this proposal already. I can only see it complicating things with no apparent value to the user or function of this system.

It makes sense, we can use either YAML or JSON. It doesn't really matter. I proposed TOML in my example because it's user friendly and easy to write.

This file needs an extension and is too generic a filename. Also, there are too many forms of "meta" flying around galaxy/mazer that it is getting confusing. Can we just call it what it is -- a manifest?

METADATA file is not indented to be read by human, it's a machine written and read file that live in package root and describes package contents and metadata in easily importable format.
As we discussed during the meeting, it can be called differently: MANIFEST, MANIFEST.json, manifest.json.

I agree with this EXCEPT it should be pushed to Galaxy rather than GitHub. Galaxy can talk to GitHub or whatever other backend storage mechanisms we decide to support. That needs to be transparent to the user and administered by Galaxy as needed.

This workflow would have major drawbacks and disadvantages for end user:

  1. User will have to give write access to the repository to Galaxy, which is unlikely to be accepted by community users and thus widely used.
  2. Galaxy will have to implement explicit support of each content provider (GitHub, GitLab, etc.). On the other hand spec is not limited to GitHub and can be easily extended with importing tarballs by URL.
  3. Galaxy can't and shouldn't manage GitHub release exclusively, which leads to consistency issues.
  4. The package release workflow will heavily depend on Galaxy. In proposed spec Galaxy is an optional component in release process, so user may have choice to use it or not. It allows user to amend their workflow easily and rollback if needed.
  5. Travis which is de-facto standard for GitHub CI has already implemented deployment process to GitHub release. In chain GitHub -> Travis CI -> Galaxy -> GitHub Releases Galaxy is useless and unnecessary component.

@akaRem
Copy link
Contributor

akaRem commented Aug 1, 2018

I have some concerns regarding overall direction of this discussion.

I mostly agree with @cutwater. But anyways I want to add 2c.

Traditional way to organise repos

At the moment, it's normal to organise repositories in the most simple and reliable way.
For example PyPI, NPM, RPM, Maven (and many others) essentially decompose the packages into folders on the file system and create metadata for all packages. Where package is just an archive like tarball file.
This way is reliable and does not produce any problems. This scheme works for years. Attempts to do something more optimal or tricky leads to problems like DEB repos have, when repo is inconsistent during updates.
Plus all these repo engines usually have some web UI with a search engine.

Here is direction you go in your discussion.

You propose to build a repository on the top of a distributed virtual file storage system, where you are not responsible for either consistency or data accessibility. You have limited ACLs for this storage and you may lost access at any time.
You will eventually improve and evolve this system. In some time you'll find this storage to contain lots of packages in outdated format which are stored over lots of different storage backends like github, gitlab, nexus, custom web servers, amazon, google..
This future doesn't look very cool.
It's much more complicated than DEB repos. Something will definitely go wrong.

You also need to consider these questions:

  • If I have to check the package before publishing, why can not I do it locally with your utilities, why should I upload it to your server where it will be checked with the very same code I have locally?
  • How will you determine where to put the package, if I have a fork with a lot of upstreams? Will i be forced to select it or configure?
  • What If I have private monorepo and I want to publish a couple of my roles?
  • What if the package has already been published? For example, I published it through on-premise Galaxy in my company, and that Galaxy does not have the same version as yours, so the check-sums do not converge?
  • What happens if I remove registered package? Will you poll for consistency in background?
  • What if GitHub is temporarily unavailable? What if Galaxy is temporarily unavailable?
  • What if I do not have a GitHub account at all?
  • What should I do if I use Gerrit?
  • I upload the package to your server, and then you upload this package to my GitHub.. Why do I need you in this chain?
  • How will you store my token? Will you encrypt it in DB? Which encryption method did you chose? What if DB will leak?
  • if I use Travis to which you refer, it turns out, that if I want to use Travis and Galaxy together, then Travis should have access token to my GitHub account, Travis should have access token to my Galaxy account and Galaxy should have access token to my Github account.. And I need to setup all these stuff. It looks like I should configure and upload upload all my tokens everywhere. And we may add more services to the chain or at least add Facebook to be sure that tokens will eventually leak.
  • I could publish package with Travis like all other artifacts. Why do you force me to use your proxy for these "special" packages?
  • Wha if you accidentally publish wrong thing into wrong place who will be responsible for that? Who will be responsible for fixing things? Will you delegate it to repo owner? If no, how will you fix it?
  • Do you really have no resources to make a simple static server? You could use Pulp, It has plugin for Galaxy roles.. It's probably not the very best (note: I didn't look at it yet), but it's cheaper to fix and improve that plugin rather then invent your own very cool distributed storage.
  • When will you roll-out support of Gitlab, BitBucket and their on-premise versions?
  • And what about Gerrit?
  • And what about security?

My proposal

Wrap all these things with python setuptools and distribute them as python packages. Use PyPI or your own repo or both.
Don't reinvent the wheel.
So many people will be thankful for this simple solution!

@alikins
Copy link
Contributor

alikins commented Aug 2, 2018

The idea of using python package and pypi is a good idea and has a lot of benefits. Some of the previous discussions have hit on some concerns though, some more valid than others. In no particular order:

  1. For ansible modules, the main concerns are for non-python modules. Either powershell modules like the win_* modules are custom stuff in other languages.

  2. A conceptual mismatch with using pypi for ansible modules is that pip and python packages are intended for installing locally for the local python runtime, and most ansible modules will only every be executed remotely. So in some sense, they are just a data payload and do not need to be installed as "real" python modules.

  3. Most ansible modules (and things like roles) aren't directly usable by python. You can't import them into a python app in a meaningful way, so distributing and installing by pip is kind of disingenuous. Though you can [ab]use pip to install non python modules or any set of files, it's kind of a off label use.

  4. Any of the cons of pip or pypi would be inherited.

    1. Most notably that it executes code from the package as part of its install. Granted, almost all software install tools do that to some degree, but pip does it more than most (at least historically anyway, the newer versions of pip and 'wheels' are better about that).

    2. Some of the custom/unusual workflow cases mentioned in the questions aren't handled particularly well by pip/pypi either. Though I suspect it will likely always support more of those workflows than any galaxy specific tooling could (unless/until galaxy is backed with something like Pulp...)

I think the list of questions to consider is good. I'll see what I can answer in another comment.

@chouseknecht
Copy link
Contributor Author

chouseknecht commented Aug 2, 2018

Here's the full process we're thinking of, with examples of the tooling we would provide to make i fairly simple.

Not sure this got stated at the outset, but here are some benefits of having a more formal release process:

  • Galaxy server can pin versions, by preventing the re-import of an existing version. If a user wants to update their content, they'll have to create a new release. They won't be able to delete or replace releases.
  • Galaxy will store a checksum for each release artifact, and provide the Checksum value to Mazer, so that during content install Mazer can check the integrity of a downloaded artifact
  • Galaxy will be able to check the contents of an actual release artifact. Today it only inspects the contents of the repository.
  • In the future, a background process could periodically verify the version information stored in Galaxy to GitHub, and remove content from Galaxy that no longer exists.

For the purposes of what follows, we’ll assume the following:

  • The user has created a project with a directory structure that looks like our sample project
  • At the root of the project, the user has added a galaxy.yml file containing metadata for the Collection.
  • A .gitignore file exists in the directory, and contains an entry for ‘release’ directory

The process consists of 3 steps: build the artifact, push the artifact to GitHub, and publish the artifact on Galaxy. The following describes these steps in detail:

Build the release artifact

  1. User updates galaxy.yml with the new content version in Semantic Version format.

  2. From within the root of the project, the user runs the mazer build command to create a build artifact.

    Mazer will perform the following actions:

    • Verifies galaxy.yml
    • Verifies project directory structure format
    • Creates a ‘release’ subdirectory, if it does not already exists
    • Within the ‘release’ subdirectory, creates a Tar GZip archive file that includes the version in the name. For example, v1.0.0.tar.gz
    • To the archive, adds a MANIFEST.json file containing metadata from galaxy.yml, including the version information.

Push the release artifact to GitHub

The artifact will be hosted on GitHub, and indexed in Galaxy. To make it easier to push the artifact to GitHub, Mazer will provide a command that interacts with the GitHub API, and performs the push. Other mechanism for pushing an artifact to GitHub are available, and thus it’s not strictly required that Mazer be used for this purpose. Adding support into Mazer is for convenience only.

To perform the push to GitHub using Mazer, the user will run the command mazer push github, and after successfully performing the push, Mazer will return the GitHub URL to the archive.

Note: For this to work, Mazer will need a GitHub OAuth token with public_repo scope. Mazer will provide a command line option, and possibly a config setting, as pathways for the user to provide the token.

Publish the release artifact on Galaxy

Once the archive is available on GitHub, the user will run the command mazer galaxy publish. Optionally, the user can pass the URL to the archive hosted on GitHub. If no URL is passed, Mazer will construct a default URL.

Mazer will do the following:

  • Interact with the Galaxy API to start the import process, passing the URL of the archive
  • Optionally, wait for the import to complete
  • Display results of the import

Galaxy server will do the following:

  • Download the release archive from GitHub
  • Perform any verification checks (same checks that were done locally by mazer build)
  • Run the static analysis / scoring process
  • Respond with results from the import

Galaxy server will store the following information:

  • Release commit SHA
  • Release download URL
  • Checksum of the release artifact
  • Release version
  • Results of verification process
  • Results of the scoring process

As stated at the beginning, using the above information, Galaxy will:

  • Prevent the re-import of an existing version. If a user wants to update their content, they'll need to create a new release.
  • Provide a Checksum value to Mazer, so that during content install Mazer can check the integrity of a downloaded artifact
  • Inspect release artifacts directly, rather than inspecting a clone of the repository.

Note: For Mazer to interact with the API, the user will need a Galaxy API token. The Galaxy UI will provide a mechanism for the user to manage API tokens. Mazer will provide a command line option, and possibly config setting, as pathways for the user to provide the token.

Note: Content authors will not be able delete release artifacts. However, we may provide a mechanism where a release artifacts is 'disabled' or 'black listed', so that it still exists in the Galaxy database, but it cannot be consumed or installed by Mazer.

@chouseknecht
Copy link
Contributor Author

chouseknecht commented Aug 2, 2018

@akaRem

You propose to build a repository on the top of a distributed virtual file storage system, where you are not responsible for either consistency or data accessibility. You have limited ACLs for this storage and you may lost access at any time.

This is no different than where we are today. Today GitHub hosts the content, Galaxy indexes it. It seems to work OK. It's not perfect, but it solves the problem of making it easy to find and download content.

Where we want to get to is a consistent way of formatting and versioning content, and thus the desire to add a more formal process to creating releases. If our content is more consistent, and reliably versioned, then we can start building an update command in Mazer that updates your installed Ansible content.

We could pivot to hosting the content on Galaxy. It's been discussed. There are infrastructure and cost challenges with making Galaxy the host. But regardless, we still need a process for turning content into a versioned release artifact that Galaxy can work with.

You will eventually improve and evolve this system. In some time you'll find this storage to contain lots of packages in outdated format which are stored over lots of different storage backends like github, gitlab, nexus, custom web servers, amazon, google.. Y

Maybe. Maybe not. GitHub seems to work for 90% of more of the community. There's been some small amount of users that have asked for other public SCMs, but not enough to drive us to build support for them.

It's much more complicated than DEB repos. Something will definitely go wrong.

Nothing horrible has gone wrong to date, while keeping the content on GitHub. If we add support for additional SCMs, then yes, things could get more complex. Another reason to only support GitHub, or at least keep the number that we do support very small.

@akaRem
Copy link
Contributor

akaRem commented Aug 2, 2018

@alikins This is a kind of offtopic for this discussion, but anyways I want to comment your concerns regarding distribution with python packages..

  1. For ansible modules, the main concerns are for non-python modules. Either powershell modules like the win_* modules are custom stuff in other languages.

Yes, but you say that

So in some sense, they are just a data payload

So this concern does not count 😉

And PyPI already have packages with foreign code, for example, JS extensions for Django. It's very convenient and cool (see PROS below)

  1. A conceptual mismatch with using pypi for ansible modules is that pip and python packages are intended for installing locally for the local python runtime, and most ansible modules will only every be executed remotely.

This is mostly false because
a) You may install output and registry plugins which are not executed remotely
b) All roles are installed locally and they are executed locally
c) Most of modules are written in such way that I can execute them as utility or import them into my python code. It's not very useful, but it's possible.
Yes, modules are intended to be used by ansible, but none of them forbid delegate_to localhost.

  1. Most ansible modules (and things like roles) aren't directly usable by python. You can't import them into a python app in a meaningful way, so distributing and installing by pip is kind of disingenuous.

Not true.
Plus for example Django apps are not usable directly (at all). But they are distributed with PyPI

Though you can [ab]use pip to install non python modules or any set of files, it's kind of a off label use.

And people do this already, f.ex. https://pypi.org/project/ansible-alicloud-module-utils/ and https://pypi.org/project/django.js/

PROS.

  1. Single requirements file for Ansible, plugins and extensions, custom modules and even win_* modules.
    No need to install plugins and 3rd party modules with pip and roles with other utility.

  2. All 3rd party roles are hidden in lib, and no one will accidentally change them.

  3. Each playbook may have it's own venv..

  4. Ansible is installed with pip, and it's super cool to install everything with just one utility and just one requirements file in one req. format.


And I'm not forcing you to publish everything to pip (which is cool) but you can use existing utilities without writing your own stuff. E.g. import pip and just override backend url -> you automagically get search, install, update, requirements file and so on..
You can skip writing your own installer and focus on other things.

Or you can even propose patch to pip to be also able to install ansible stuff. And this will be super cool!

@akaRem
Copy link
Contributor

akaRem commented Aug 2, 2018

@chouseknecht if it's just improvement for current release process than it looks sane, but if author uses mazer already, then you may check artefact before uploading

Galaxy server will do the following:

Download the release archive from GitHub
Perform any verification checks
Run the static analysis / scoring process
Respond with results from the import

— you may execute (optional?) pre-publish step:
upload to galaxy first .. then galaxy should validate artifact and save it's hash sums and metadata and wait for real publish (and respond that artifact is correct/incorrect)
Once uploaded, no special validation are required, just check hash.
Extra work, but guarantees that upload artifact is ok.

@akaRem
Copy link
Contributor

akaRem commented Aug 2, 2018

And after

Download the release archive from GitHub

.. why not store it locally (at least for backup)? 🙂

@cutwater
Copy link
Collaborator

cutwater commented Aug 2, 2018

@akaRem Thank you for participating in this dicsussion.

— you may execute (optional?) pre-publish step:
upload to galaxy first .. then galaxy should validate artifact and save it's hash sums and metadata and wait for real publish (and respond that artifact is correct/incorrect)
Once uploaded, no special validation are required, just check hash.

I like that idea, it allows us to run checks before user publishes a package and doesn't require Galaxy to manage user's repository. It'll kill two birds with one stone.

@akaRem
Copy link
Contributor

akaRem commented Aug 2, 2018

And I want to recommend you to make backups. You may not serve stored packages, but it's better not to be in the center of scandal .. or at least it's better to be able to recover in reasonable time.
Do you remember https://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/ ? 😈

@cutwater
Copy link
Collaborator

cutwater commented Aug 2, 2018

And I want to recommend you to make backups.

Storing content for backup purposes only will definitely be less expensive than maintaining full featured infrastructure for content delivery. I think it's worth consideration.

@tima
Copy link
Contributor

tima commented Aug 3, 2018

@akaRem Thank you for taking the time to make such detailed and impassioned suggestions. We really do appreciate it.

Using pip has been researched and considered a few times over the years including @alikins and myself. We’ve all come to the same conclusion that it is not a good fit for the Ansible and Red Hat community as a whole. That matter is settled.

This proposal is about the merits of continuing with a pull based SCM backed system for storing and distributing artifacts that better support providing verifiable versioned content in a way that is reliable and consistent with the Ansible way. There you make some excellent points we need to take into consideration and work into whatever we come up with.

It would be really helpful if we could focus the conversation there.

Thanks for your continued feedback.

@daviddavis
Copy link

Thanks for writing up the proposal. I was discussing it @bmbouter today and we had a few questions around how it will work.

  • Will this new format coexist with the old one (or the one that currently exists today for roles)? Or will you stop supporting the old one?
  • Why not mimic the packaging format of other code that is delivered statically, i.e. tar one or more roles into a single tarball? Each role has all of its own metadata inside of it, the same metadata roles use today. The directories can be traversed easily so you probably don't need to include a "repo-level" metadata of any kind. This seems straightforward. Are there issues with this approach?
  • Why not have tools to organize roles into repositories instead of packaging stacks of content together? Moreover, how will users combine pieces of each repo together or use different combinations of role versions together?
  • Regarding v3, when I download roles from Galaxy I use the ansible-galaxy CLI but when downloading multi-content repos I need to use mazer? Why not just extend the ansible-galaxy CLI instead of creating a separate tool?

Thanks.

@chouseknecht
Copy link
Contributor Author

@daviddavis

Good questions. Here are my thoughts...

Will this new format coexist with the old one (or the one that currently exists today for roles)? Or will you stop supporting the old one?

Yes, it will coexist. Don't know for how long, but definitely at the start, and possibly for a good while.

Why not mimic the packaging format of other code that is delivered statically, i.e. tar one or more roles into a single tarball? Each role has all of its own metadata inside of it, the same metadata roles use today. The directories can be traversed easily so you probably don't need to include a "repo-level" metadata of any kind. This seems straightforward. Are there issues with this approach?

I think what you're suggesting is, 'Why not use an RPM spec file, or something similar?' I think we're trying to hold true to the original concept of a role, which is that a role = an SCM repository of YAML files. However, instead of limiting a project or repository to one single role, we're allowing for it to contain multiple roles, and potentially Ansible modules and plugins. Here's an example of what we're contemplating.

As an Ansible content developer, I should be able to create such a project, and consume it's contents in an Ansible playbook directly, without having to un-package or install it. I think that's why using a packaging tool like RPM doesn't fit. Otherwise, I think you're right.

Why not have tools to organize roles into repositories instead of packaging stacks of content together? Moreover, how will users combine pieces of each repo together or use different combinations of role versions together?

I think we specifically don't want to encourage pulling pieces of repos together. We may enable repository level dependencies, where you could, for example, have repository B dependent on repository A, and reasonably expect the installer to also install all of repository A when asked to install repository B.

What we're trying to solve is a world where third party partners can deliver their collection of Ansible content in a package. One example might be the Microsoft Azure modules that currently live in ansible/ansible. We would like it to be possible, and easy, for Microsoft to ship that suite or collection of modules via Galaxy, rather than have them baked into the Ansible source. They should be able to ship a suite of modules, plugins, and roles together, in which case we think it makes sense to have it all versioned in unison.

Regarding v3, when I download roles from Galaxy I use the ansible-galaxy CLI but when downloading multi-content repos I need to use mazer? Why not just extend the ansible-galaxy CLI instead of creating a separate tool?

We want the tooling to move independently and potentially faster than Ansible. We also want room to experiment with stuff, and not have it end up baked into a supported release of Ansible that we then have to live with for a number of years. Once something lands in an official Ansible release, it's hard to remove it.

@chouseknecht
Copy link
Contributor Author

chouseknecht commented Aug 20, 2018

Following up on the discussion with @daviddavis, @bmbouter, @alikins and @cutwater...

We decided the following:

  • Eliminate the push github command. There's no need for Mazer to push directly to GitHub.
  • Mazer will publish to Galaxy. Galaxy will publish the artifact to GitHub, and possibly in the future, store a copy of the archive in Pulp or similar service.
  • The publish command will perform a multi-part upload of the file to a Galaxy API endpoint
  • Mazer will provide the ability to override the upload URL, so that archives can be uploaded to any service that can accept multi-part file uploads (e.g., Pulp)

Just to be clear, we're not forcing contributors to use this process day one. Galaxy will continue to support the existing import process that relies only on GitHub repositories. This new process will be optional. Consider it the first phase in moving Galaxy toward hosting content.

This was referenced Aug 20, 2018
@alikins
Copy link
Contributor

alikins commented Aug 29, 2018

There is an ansible proposal for standardizing how role install requirements files work, notably establishing that install-time role requirements will live in a requirements file at meta/requirements.yml

See ansible/proposals#57

The requirements file mentioned here is based on the requirements file as described at https://galaxy.ansible.com/docs/using/installing.html#installing-multiple-roles-from-a-file

For example a role that now includes a meta/main.yml and some requirements YAML file somewhere in the role, would standardize on meta/requirements.yml.

Since the info the meta/requirements.yml is installation requirements, that info could also be included in galaxy collection artifacts. The install requirements for a set of roles could be populated into the collections MANIFEST.JSON

For example, a collection like this:

my_collection/
my_collection/galaxy.yml
my_collection/roles/
my_collection/roles/some_role_a/
my_collection/roles/some_role_a/meta/
my_collection/roles/some_role_a/meta/main.yml
my_collection/roles/some_role_a/meta/requirements.yml
my_collection/roles/some_role_a/tasks/
my_collection/roles/some_role_a/tasks/main.yml
my_collection/roles/some_role_x/
my_collection/roles/some_role_x/meta/
my_collection/roles/some_role_x/meta/main.yml
my_collection/roles/some_role_x/meta/requirements.yml
my_collection/roles/some_role_x/tasks/
my_collection/roles/some_role_x/tasks/main.yml

mazer build would add the build as specified, but it could also pull the requirements from roles' requirements.yml and include in a 'requirements' field in MANIFEST.json

{
    "collection_info": {
        "namespace": "some_namespace",
        "name": "my_collection",
        "version": "11.11.11",
        "format_version": 0.0,
        "author": "Cowboy King Buzzo Lightyear",
        "license": "GPLv2"
    },

    "# Don't sweat the details of the requirements data, 
     # this is just a strawman example. It could end up 
     # being simpler or more complicated.": [],
    "install_requirements": [{"requirement": {"name": "geerlingguy.nginx",
                                              "version": "1.2.3"},
                                              "needed_by": "roles/some_role_a"},
                             {"requirement": {"name": "testing.ansible-test-content",
                                              "version": "1.2.3"},
                              "needed_by": "roles/some_role_b"}],
    "format_version": 0.0,
    "files": [
                "# lots of file info here"
                ]
}

So ansible/proposals#57 is something that we likely want to support with mazer build, and eventually by 'mazer install' as a way for it to resolve collection requirements and deps.

@alikins
Copy link
Contributor

alikins commented Aug 29, 2018

'meta/requirements.yml' may also be something we want to support at the collection level.
It would be used much like the role specific meta/requirements.yml proposal mentioned in #957 (comment)

For example, a collection:

my_collection/
my_collection/meta/
my_collection/meta/requirements.yml
my_collection/modules/
my_collection/modules/some_module.py
my_collection/callback_plugins/some_callback.py
my_collection/roles/
my_collection/roles/some_role_b_that_needs_module_from_collection_foo_bar/meta/
my_collection/roles/some_role_b_that_needs_module_from_collection_foo_bar/meta/requirements.yml
my_collection/roles/some_role_b_that_needs_module_from_collection_foo_bar/role_a/tasks/
my_collection/roles/some_role_b_that_needs_module_from_collection_foo_bar/tasks/main.yml

In that case, my_collection/meta/requirements.yml could be

- some_namespace.my_collection_utils

and my_collection/roles/some_role_b_that_needs_module_from_collection_foo_bar/meta/requirements.yml

- foo.bar

(or perhaps have a way to indicate a install requirement is a collection or a specific role, if we want to
support requiring a specific role in addition to requiring collections)

Those would get combined into 'install_requires' in MANIFEST.json by 'mazer build'

@chouseknecht chouseknecht modified the milestones: 3.1.0, 3.2.0 Nov 15, 2018
@chouseknecht
Copy link
Contributor Author

Think we resolved this with the introduction of Collections. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants