Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document contribution policy for adding software to EESSI #108

Merged
merged 11 commits into from
Nov 9, 2023

Conversation

boegel
Copy link
Contributor

@boegel boegel commented Jul 5, 2023

I have marked this as draft (work-in-progress), because this policy is only a proposal at this point, and needs to be discussed among project partners a bit more before adding it to the EESSI documentation.

All feedback is welcome!

@terjekv
Copy link
Member

terjekv commented Jul 5, 2023

Hm. Considering their potential for abuse, should we require that when using --from-pr or --include-easyblocks-from-pr the submission must also state why these are used and preferably link to issues documenting the need for those options?

@boegel
Copy link
Contributor Author

boegel commented Jul 5, 2023

Hm. Considering their potential for abuse, should we require that when using --from-pr or --include-easyblocks-from-pr the submission must also state why these are used and preferably link to issues documenting the need for those options?

Yes, that should be a requirement, indeed.
We can actually go one step further: we only allow the use of --*from-pr when the PR is merged.

The extra benefit of that is that it can put some pressure on actually getting PRs merged on the EasyBuild side (easybuilders/easybuild-easyblocks#2248 comes to mind here).


Only **open source software** can be added to the EESSI repository.

Make sure that you are aware of software license, and that redistribution is allowed.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"aware of relevant software licenses" ??

@terjekv
Copy link
Member

terjekv commented Jul 5, 2023

Requiring merging can be harsh, but for 99% of cases, it should be fine. But easybuilders/easybuild-easyblocks#2248 also suggest that this can stall stuff. We can start off with a "must be merged"-policy and reevaluate with a possible migration to "should be merged" at a later date, if needed?

(We should probably also use som auto-labelling of PRs and set "uses-from-pr" or a similar tag if it is seen in the diff, just to help maintainers?)

## Software versions & toolchains

Recent software versions and toolchains *should* be preferred,
although the installation of older versions of use of older toolchains is allowed if sufficiently motivated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"versions of use of older toolchains" this is rather unclear, what is really meant by this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reworded, should be clearer now

Comment on lines 62 to 63
Recent software versions and toolchains *should* be preferred,
although the installation of older versions of use of older toolchains is allowed if sufficiently motivated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we've only started to discuss this, and the machinery to help enable this doesn't yet exist, but we should maintain that once a toolchain is connected to a particular compat layer we don't include that toolchain in future compat layers. If we allow people to create PRs for older toolchains to newer compat layers we will bring a lot of baggage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's perhaps a bit too restrictive, there can be very valid reasons to have a recent toolchain like foss/2023a both in the current EESSI version, but also have it in a future version?

Copy link
Member

@ocaisa ocaisa Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What type of reason? I can't think of a need for this. You can have the later toolchain, just with a different compat layer. If we select compat layer via a module, rather than sourcing a script (which is entirely possible) they can happily live side by side using a (reduced) hierarchical view.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially when we already know there are cases where it can't work, for example the whole OpenSSH thing we ran into with 2023.04

Copy link
Collaborator

@trz42 trz42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good start. I think, it's too early to make this public (and thereby a requirement). We should be much more restrictive across the types of requirements and then have the machinery in place for enforcing it.

Internally, we can start with something like the policy as described and try to ensure to meet all requirements.

@@ -0,0 +1,80 @@
# Contribution policy

When [openining a pull request to add software to EESSI](adding_software.md), the following requirements must
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd start with a brief description of what this policy is for. For example,

The purpose of the contribution policy is to provide guidelines for adding software to the shared EESSI repository. It informs about what requirements a software to be added must meet. ...

Small typo openining $\longrightarrow$ opening. I'd rephrase this, however, to leave out the technical aspect (opening a pull request). E.g.,

Any software to be added to the EESSI repository must meet the following requirements:

1. Open Source Software only (see section X for details)
2. ...
3. ...

Comment on lines 16 to 21
For more information about a specific license, see the [SPDX license list](https://spdx.org/licenses/).

!!! note

We intend to automatically verify that this requirement is met,
by requiring that the [SPDX license identifier](https://spdx.dev/ids/) is provided for all software.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifying the SPDX license identifier is probably not enough. Verifying it (that the identifier reflects the license) automatically is likely difficult.

I'd suggest to be rather restrictive at the start:

  1. Every software to be added must provide license information covering the full sources of the software package.
  2. For all dependencies of the software, license information covering the full sources must be provided too.
  3. License information must be given as SPDX license identifier.
  4. At the start (policy version 0.1) only the following license identifiers are accepted: list SPDX license identifiers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we want to start doing this, but this shouldn't be required in the first version of the policy.

We need to set up a mechanism for this first, for example a licenses.yaml file that maps software names to SPDX identifiers, and ideally also a way to detect that one or more entries are missing...

So, for now, I would keep it like it is now, and then work towards requiring that SPDX license identifiers can be provided somehow, and then update the policy accordingly once that is in place.

@@ -0,0 +1,80 @@
# Contribution policy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a version to the policy?

Comment on lines 40 to 43
!!! note

This restriction may be relaxed later to also allow adding software that is not supported yet in the latest
EasyBuild release, or to allow for installing software with other tools.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this. Raises expectations we may not (want to) meet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this mostly to counter feedback like "why are you only using EasyBuild", but perhaps the contribution policy is not the place for that, so will remove.

Comment on lines 47 to 51
A [compiler toolchain](https://docs.easybuild.io/terminology/#toolchains) that is still supported by the latest
EasyBuild must be used for building the software.

More information on deprecated toolchains in EasyBuild is available
[here](https://docs.easybuild.io/deprecated-easyconfigs/#deprecated_easyconfigs_toolchains).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be more restrictive here? E.g.,

Only toolchains already available in the EESSI repository may be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that any toolchain still supported in EasyBuild is eligible to be included in EESSI, at least in the initial version of the policy.

Going back to older toolchains is likely going to be significantly more painful, if only to get even the installation of GCC to work, so this is sort of self-regulating...

Also, we can't really use a statement like "Only toolchains already available in the EESSI repository may be used.", because then adding toolchains (however recent) would be against the contribution policy? :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be up to the EESSI maintainers what toolchains are and are not supported by a particular EESSI release, this can be handled via our hook, see for example https://github.com/easybuilders/JSC/blob/2024/Custom_Hooks/eb_hooks.py#L570-L640

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Allowing someone to use a new toolchain is a big step, as obviously this is extremely likely to trigger a massive number of new software packages (and maintainer effort).

Comment on lines 65 to 80
## Testing

We should be able to test the software installations via the [EESSI test suite](https://github.com/EESSI/test-suite)
being developed.

Ideally one or more tests are available that verify that the software is functionally correct,
and performs well.

It should be possible to run a minimal *smoke test*, for example using EasyBuild's `--sanity-check-only` feature.

!!! note

The [EESSI test suite](https://github.com/EESSI/test-suite) is still in active development,
and currently only has a minimal set of tests available.

When the test suite is more mature, this requirement will be enforced more strictly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reads very vague, more like a goal/aim. I'd rather require that a software must be tested for functional correctness, single-core performance and multi-core scalability. Since the machinery (and tests) do not exist yet, the request to add a software should detail how the software can be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's way more restrictive than what we're doing currently, for many installations (dependencies) we don't really test at all...

So we need to keep this relatively loose for now.

Eventually we can hopefully require that for example --sanity-check-only works on all installations, but that needs more work, and that the latest release of the EESSI test suite passes with certain tags (like CI).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the wording is OK for now.

One of the plans in the EESSI test suite is to add a "smoke test" that basically just runs eb --sanity-check-only on all installations.

Once that is in place, the minimal requirement for this part could be that this test must pass for all software installations being added, but then we should first fix some known problems in EasyBuild like easybuilders/easybuild-easyblocks#2986

@boegel boegel marked this pull request as ready for review October 24, 2023 09:11
Copy link
Collaborator

@trz42 trz42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Almost done. A few small suggestions for rephrasing. One wish for the testing section.

I wonder if/how we could encourage contributors to help us define meaningful tests. For example,

  • for package X/y.z, obtain test dataset from https://x.org/testdata_vX.Y.tgz
  • unpack test data
  • run X -arg 1 -arg 2 testdata.csv
  • expected output files are: testout.jpg and testout.log where testout.log includes 1 line matching SUCCESS and no line matching ERROR


Only **open source software** can be added to the EESSI repository.

Make sure that you are aware of relevant software license, and that redistribution is allowed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo?

Suggested change
Make sure that you are aware of relevant software license, and that redistribution is allowed.
Make sure that you are aware of relevant software licenses, and that redistribution is allowed.

or even?

Suggested change
Make sure that you are aware of relevant software license, and that redistribution is allowed.
Make sure that the software uses an open source license, and that redistribution is allowed.

Copy link
Member

@ocaisa ocaisa Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit-picking but the original formulation is better, we do allow binary distribution as long as the redistribution is permitted by the licence (CUDA for example falls into this category, we distribute the runtime)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ocaisa Then this shouldn't refer to "open source" software at all?
That is going to make the wording here a lot less clear though...

Comment on lines 71 to 76
### e) CPU targets { #cpu_targets }

The software *should* work on all [CPU targets supported by EESSI](software_layer/cpu_targets.md).

Exceptions to this requirement are allowed if technical problems that can not be resolved with reasonable effort
prevent the installation of the software for specific CPU targets.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use architecture instead of targets?

Suggested change
### e) CPU targets { #cpu_targets }
The software *should* work on all [CPU targets supported by EESSI](software_layer/cpu_targets.md).
Exceptions to this requirement are allowed if technical problems that can not be resolved with reasonable effort
prevent the installation of the software for specific CPU targets.
### e) CPU architectures { #cpu_targets }
The software *should* work on all [CPU architectures supported by EESSI](software_layer/cpu_targets.md).
Exceptions to this requirement are allowed if technical problems that can not be resolved with reasonable effort
prevent the installation of the software for specific CPU architectures.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@trz42 We use "targets" in https://www.eessi.io/docs/software_layer/cpu_targets, so sticking to targets is more consistent?

I guess we could use "target CPU architectures"? (it should really be microarchitectures, though)

Comment on lines 91 to 96
We should be able to test the software installations via the [EESSI test suite](../test-suite).

Ideally one or more tests are available that verify that the software is functionally correct,
and performs well.

It should be possible to run a minimal *smoke test*, for example using EasyBuild's `--sanity-check-only` feature.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question is, should we ask contributors to help us define a meaningful test?

Such tests could be added to the EESSI test suite. "Ideally" sounds like a very loose requirement ... and missed opportunity to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making this a requirement for someone who is really just an end-user is a high bar. I agree for that marquee applications we should have such tests, but it's not realistic to expect that for all cases. In general, it's probably a much more pro-active effort on our part where we'd need to ensure we are reaching out to the actual developers (or very experienced users)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed necessarily needs to be a rather loose requirement, or we will be way too restrictive, and won't get any contributions because of that.

It's definitely unreasonable to require that a test must be available in the EESSI test suite, since adding something there is far from trivial (requires knowing Python, ReFrame, how to write a portable test, etc.).
We can probably provide some documentation with guidelines on how to do that eventually, but then it will still be quite an elaborate task.

We should phrase this such that it's clear that we prefer being able to test the software, in one way or another, without imposing a huge amount of effort on contributors, at least initially.
Maybe we can mention options like providing a test case, referring to an example or documentation of a basic run, etc.?

### f) Versions & toolchains { #versions_toolchains }

Recent software versions and toolchains *should* be preferred,
although the installation of older software versions and toolchains is allowed if sufficiently motivated.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not keen on this, we open the door to a lot of pain by looking backwards. In general, I do believe we should decide what toolchains to support on a specific compat and stick to it, and probably take the opportunity once a year to update the compat layer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That feels like swinging over to the other side a bit too much (by "locking" a toolchain version to a particular EESSI version, and hence compat layer).

Keeping compat layers updated is gradually going to become quite a bit of effort, and so we may want to install an existing toolchain version in a newer compat layer at some point, since that's likely to be a lot less painful than updating a previous compat layer version.

I guess we could stick to "a toolchain version is only installed on top of a single compat layer version" initially, and then see if that makes sense going forward, and then adjust accordingly if needed.

But should this be part of the contribution policy?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on the fence on this one. On the one hand, for a good user / contributor experience, discouraging of forbidding the deployment of old toolchains will probably lead to fewer issues. On the other hand: if someone needs an older toolchain, and if it works, I see little reason not to accept a PR like that.

I had a chat yesterday with someone from a university HPC support staff who asked 'if my user asks for something from a 2019b toolchain, because he needs the particular software version in that toolchain, can I deploy that through EESSI?'. Even though he fully agreed we should give pushback to users requesting this stuff, in the end he's also realistic and just wants to help the researcher. If it were a local software stack, he would have definitely given it a try. If our policy 'forbids' that, he would be pushed to a local solution. I'm a bit concerned if it wouldn't turn HPC support staff away from using EESSI if we are too restrictive on these things.

I guess we could stick to "a toolchain version is only installed on top of a single compat layer version" initially, and then see if that makes sense going forward, and then adjust accordingly if needed.

What do you mean here? If foss-2022a went into 2023.06, it cannot also go into 2023.12? I wouldn't do that. I think Alan's suggestion was more like: we can decide for 2023.06 to support toolchains 2021a onwards, for 2023.12 we do 2021b onwards, etc. So then there is overlap (2022a would still be on top of both versions of the compat layer), but it prevents deployment of very old toolchains on new compat layers (which probably won't work anyway).

The more I think about it, the more I am not in favour of limiting this initially in the policy. I think people will find it's hard to get older compilers to build, and older MPIs to function properly. In practice, I think they might still be blocked from contributing such toolchains, simply because it's impossible to get them to work (on new compat layers, at least). I wouldn't forbid that, just let the technical limitation be the blocker, not our 'rule'. And, I agree we could still mention this (i.e. old stuff is discouraged, likely to cause issues, and the community probably won't be very motivated to help you out) clearly in the policy.

Another advantage of not limiting it initially is that it allows us to build experience, which we can later use to make an informed decision to forbid it after all (if needed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@casparvl We discussed this aspect during the bot sync meeting earlier today, and I've made some changes to the policy in 86f99c3 based on that.

There are now separate entries for recent toolchains and recent software versions, where the former mentions that contributors should either use already installed toolchains, or submit a motivated support request to get an additional toolchain installed, which puts the ball in our camp, and leaves the door open for contributors.

There was a consensus during the meeting that this makes sense for the initial version of the policy, and we can revise that later should the need arise.
In practice, it's very likely anyway that is will be EESSI admins who will add toolchains.

We also briefly discussed that we should only install the latest N toolchains in the latest version of EESSI at any time, which allows for minimal overlap across EESSI versions (but that's not part of the policy, that's more of an internal thing we should try and stick to).
In practice this will probably mostly be governed by the compatibility of the glibc version in the compat layer and the GCC version in the toolchains...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EESSI/software-layer#371 I've included the ability for both the site and the user to extend EESSI as they see fit. So even if we don't support it (most likely because of the architecture situation) doesn't mean that they can't do that locally if it "just works".

Copy link
Collaborator

@casparvl casparvl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that now #117 was merged, we put a placeholder for the contribution policy at https://www.eessi.io/docs/contributing_sw/contribution_policy/

That'll require a rebase or merge into this feature branch.

Copy link
Member

@ocaisa ocaisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

docs/contribution_policy.md Outdated Show resolved Hide resolved
ocaisa
ocaisa previously approved these changes Nov 9, 2023
Copy link
Collaborator

@trz42 trz42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work @boegel !

Just a tiny suggestion.

docs/adding_software/contribution_policy.md Outdated Show resolved Hide resolved
Co-authored-by: Thomas Röblitz <trz42@users.noreply.github.com>
@bedroge
Copy link
Collaborator

bedroge commented Nov 9, 2023

Lgtm 👍🏻👍🏻

@trz42 trz42 dismissed casparvl’s stale review November 9, 2023 20:06

Hi @casparvl, I think your requested change was implemented and your concerns about the toolchain versions were addressed. So, opted to dismiss your review. Hopefully ok with you. Cheers Thomas

@trz42 trz42 merged commit 8fd9907 into EESSI:main Nov 9, 2023
1 check passed
@casparvl
Copy link
Collaborator

Great work @boegel ! I think the changes you made after my review make sense. And as you mentioned: let's build experience, and fine tune if needed :)

@boegel boegel deleted the contrib_policy branch December 22, 2023 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants