Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ELPP-3358 Choosing an open source component #9

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

davidcmoulton
Copy link
Member

@davidcmoulton davidcmoulton commented Jan 9, 2018

This is a draft for an ADR to help decision making when selecting an open source component. Feedback welcome. Pinging @elifesciences/article-viewer @elifesciences/digirati @elifesciences/coko @elifesciences/yld

Feel free to tag other relevant people on this as appropriate.


### Licensing

[should we specify this in terms of an explicit list of acceptable licenses, or in terms of what we need a licence to allow us to do?]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

considering we release everything with the MIT license, there is a compatibility issue (with GPL?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an explanation on what it means to be 'compatible' with GPL licensed code: https://www.gnu.org/licenses/gpl-faq.en.html#WhatIsCompatible

and a list of compatible licences:
https://www.gnu.org/licenses/license-list.en.html#GPLCompatibleLicenses

there is a lot of argument and untested legal theorizing on edge cases in most software licences. The only perfectly clear cut legal standing, in my humble opinion, is "public domain". Even the legality of closed sourced proprietary code becomes fuzzy if it's not 100% written internally in the same country in the same decade.

also, there are different types of code usage, like linking vs bundling vs compiling. is a compiled program with gpl licensed code required to also be licenced by the GPL? (yes). what if it's unmodified and uncompiled, or compiled to an intermediate format for an interpreter (javascript, python)? what if it's called from a script? is a script a program? all sorts of fun and games here.

Ian settled on the MIT licence for eLife, and while I don't agree with that decision ideologically, it is convenient for others for all sorts of reasons. Convenience and fewer barriers and very important to adoption if adoption is one of your goals.

I would be curious to find the licences of all of the dependencies our programs use and see which large groups emerge. I'm less worried about proprietary software sneaking in than I am about those who use a licence like MIT or BSD and then 'tweak' it, adding their own custom clauses.

+1 for a licence audit. I know observer and metrics and possibly lax are still GPLd.


## Status

Proposed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd tag as many people as possible here, since this has wide-ranging implications

@davidcmoulton davidcmoulton changed the title Choosing an open source component ELPP-3358 Choosing an open source component Jan 9, 2018

### Well maintained

- Are there a decent number of recent commits? This will indicate whether it's under active development. Avoid it if it's not.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this stance a lot in javascript developers. recent activity is no indication of stability and is no reason to avoid a thing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is spuriously correlated with how big a project is, e.g. the Linux kernel has many commits per day even if some areas are very stable and the release pipeline is very long.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the other direction, if there is a single master branch with many commits per week it's an indication of in-stability. Number of existing releases could be a better litmus test for stability?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the JavaScript world things are moving so fast that something which lacks recent commits probably is out of date. I take the point that this doesn't necessarily apply in other environments.

@lsh-0
Copy link
Contributor

lsh-0 commented Jan 10, 2018

I think the points made are fair overall, but some of the conclusions seem impractical outside of the javascript world, like code auditing dependencies, conflating size, popularity and activity as measures of stability or security and that a dependency's ease of installation/configuration might preclude it from being maintained properly internally. Some projects have become very stable, or were never really popular, or are just very niche, or have maintainers who don't like outside help or, or, or ...

Most of this ADR is common sense and I'd expect nothing less from a developer - being able to nose out problem dependencies should be like a sixth sense after a time - but if we're to make this a practical ADR then it should be less prescriptive and more of a guide to choosing and managing dependencies, including the mountain we already use daily, with a list of must haves (a compatible Free/OSS licence), nice to haves (many maintainers, community enthusiasm), common gotchas (customised licences) and warning signs (few commits + many outstanding issues/no issue tracker, exotic code).

I'd like to see greater separation of popularity from quality, activity from stability and a change of 'sustainable' to something else. Perhaps 'complexity'. Giorgio and I have managed to accommodate almost everything you guys have thrown at us thus far, even IIIF.

I'd also like to quantify 'quality' before we start evaluating components for it. It could be a simple points system, but preferably a set of metrics we can apply uniformly. It wouldn't condemn a thing but may invite more scrutiny of it

@giorgiosironi
Copy link
Contributor

👍 on must-have vs nice-to-have vs warning signs

Proposal: during the inclusion of a library in a pull request, compile the checklist included in this ADR and let the maintainers of the project make an informed choice:

  • license: MIT
  • issue: ~1000 are open
    ...

@gnott
Copy link
Member

gnott commented Jan 10, 2018

This page describes some constructive metrics for evaluating the selection of tools. Due to the variability of what is available for particular situations, to consider these as suggestions and guidelines is probably better than to think of them as strict rules to follow.

In some situations -- including and not limited to areas of science and science publication -- there could be only one best choice from limited options.

Another thing I like about this procedure is to help everyone to understand all the important factors in the decision making, because the way each person evaluates things may vary. We will have to ensure everyone remembers its contents, and when evaluating new components to perhaps refer back to it.

@davidcmoulton
Copy link
Member Author

  • reframed with must have; nice to have, and warning signs
  • removed various inferences that may not apply in all contexts
  • added @giorgiosironi's checklist idea

Comments welcome.

- [] open issue count:
- [] commit count:
- [] maintainer count:
- anticipated effort to maintain in our environment:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there was something quite complex, but was only used in one place in your environment that might be described as "low", and something thats simpler to manage but used almost everywhere might be described as "high".

Could add something for how often its planned to be used?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're onto something about the additional directions that can influence how big the dependency is.

This (which refers to technologies in general, not just libraries or frameworks) came to mind:

@lsh-0
Copy link
Contributor

lsh-0 commented Jan 22, 2018

when going through this checklist for a new python dependency, a few thoughts occurred:

  • convenience almost always trumps everything else
  • we love easy to check checkboxes. ambiguity and guessing wastes time and is stressful
  • I'd game this ADR to get what I want because of the above points
  • I'd prefer to spend my time automating this than gaming it

elaborating:

  • make this convenient

opening a PR and having Alfred run my tests is fantastically convenient.
alfred should check my dependencies and post a warning if anything is dubious.
just a simple "this library only has a few commits. are you sure?" type message
if we wanted to get fancy we could add a button "yes, I'm sure" that makes Alfred swallow that warning for that dependency+project in future

  • ambiguity and guessing and checking boxes

yes, "exotic code" came from me, but it's the sort of thing you can only quantify after spending a lot of time with a language and other people's code.
I'm not about to do code diving of every dependency and I'm not always going to be using my preferred languages.
"community enthusiasm" is another great ambiguous checkbox. also came from me, I think.
I can't measure the enthusiasm of others about a library, but exotic code or an elaborate integration is definitely to be considered but still difficult to measure.
I would leave this checkbox to the very end and if all other boxes get ticked, there shouldn't be a reason to go code diving.
If you find yourself code diving, then other checkboxes may be missing from this list (docs, but no api docs, for example)

if alfred can tick most of these boxes automatically, we should only be alerted when one can't.

  • gaming ADRs

if a thing is too hard or inconvenient to do and there are no consequences or checks that it has been done, it will eventually stop being done. for example: updating documentation.
and if there are checks and they are weak, it will be gamed.
I'm not a robot and I stopped aspiring to be one some time ago. I value automation and convenience and being able to measure things unambiguously. Grunt work (batches of dependencies that need The Checklist applied) and arse-covering (all the boxes were ticked!) are boring and will lead to mistakes and drift and gaming behaviour.

  • automate this

from this ADR it's clear we're able to come up with unambigious metrics we think are important.
we'll probably think of others as time goes on.
if we treat this ADR as something that can be run through manually, if one really has to, then put the easy to check points at the top of the groups and either quantify the ambiguous points into easily checkable boxes or push them to the bottom and have Alfred alert us to potential problems.

@giorgiosironi
Copy link
Contributor

alfred should check my dependencies and post a warning if anything is dubious.
just a simple "this library only has a few commits. are you sure?" type message
if we wanted to get fancy we could add a button "yes, I'm sure" that makes Alfred swallow that warning for that dependency+project in future

It is probably impractical to check all dependencies listed on every build, but checks can be made against the new ones that differ from the develop/master branch. This is language-specific: composer.json, requirements.txt and so on. Of course it depends on which check to make:

  • running pylint or similar tools on the library clone (easy)
  • going back to the source repository on the right tag and extract some statistics or a license (medium, primarily because of discovering the right Git commit, and the library may not be on Git or Github anyway)
  • checking if the maintainers have been nice or naughty this year (difficult)

@lsh-0
Copy link
Contributor

lsh-0 commented Jan 23, 2018

It is probably impractical to check all dependencies listed on every build

sure. it could consult a project-local 'approved' list that is populated automatically by the tool or explicitly overridden when a human gives it a thumbs up.

obviously language specific, but baby steps first. I'm thinking of the Github and Pypi APIs that give you access to a lot of project information:

$ curl https://pypi.python.org/pypi/requests/json

like licence, downloads, version history and, in the python world, a (self-applied) stability status.

running pylint or similar tools on the library clone (easy)

ha! I'll hold you to that ...

checking if the maintainers have been nice or naughty this year (difficult)

yeah, difficult to measure. I've never heard of a dependency shitlist before, but I suppose they must exist, like spam blacklists

@giorgiosironi
Copy link
Contributor

running pylint or similar tools on the library clone (easy)
ha! I'll hold you to that ...

running it is easy, whether it gives a good result it's another matter


Using third-party software that is difficult to maintain integrated in our environment can make updating it difficult, time-consuming or risky.

Using software with an inappropriate license may present a legal risk, or may risk others choosing not to use our software because of the burden the license subsequently places on them.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit-pick: in the UK license is the verb, licence is the noun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need an ADR for British vs. American English 🇬🇧 🇺🇸

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I used to have a yellow sticky note next to me on the wall that would remind me ;) I'm certain I've lapsed since then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- low complexity to maintain in our environment

## Warning signs
- many open issues
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer many open issues without activity

@nlisgo
Copy link
Member

nlisgo commented Jan 31, 2018

For PHP projects composer is the de facto dependency management tool. I just found this project: composer-plugin-license-check

It has a whitelist and blacklist configuration. It also has a handy script to output the licence for all of the dependencies. This is useful because it reports on all of the dependencies, not just the immediate dependencies. This is the output of composer check-licenses for the annotations project:

Name: elife/annotations
Version: dev-develop
Licenses: MIT
Dependencies:

Name                                           Version             License       Allowed to Use?  
aws/aws-sdk-php                                3.38.4              Apache-2.0    yes              
beberlei/assert                                v2.7.11             BSD-2-Clause  yes              
crell/api-problem                              2.0                 MIT           yes              
csa/guzzle-cache-middleware                    v1.0.0              Apache-2.0    yes              
doctrine/annotations                           v1.4.0              MIT           yes              
doctrine/cache                                 v1.6.2              MIT           yes              
doctrine/instantiator                          1.0.5               MIT           yes              
doctrine/lexer                                 v1.0.1              MIT           yes              
elife/api                                      dev-master a1c7283  MIT           yes              
elife/api-client                               dev-master 560963f  MIT           yes              
elife/api-problem                              v1.1.0              MIT           yes              
elife/api-sdk                                  dev-master a2e479c  MIT           yes              
elife/api-validator                            dev-master 53f10fb  MIT           yes              
elife/bus-sdk                                  dev-master 721aad8  MIT           yes              
elife/content-negotiator                       v1.0.0              MIT           yes              
elife/logging-sdk                              dev-master 81a10f8  MIT           yes              
elife/ping                                     v1.0.0              MIT           yes              
ezyang/htmlpurifier                            v4.9.3              LGPL          yes              
firebase/php-jwt                               v5.0.0              BSD-3-Clause  yes              
giorgiosironi/eris                             0.9.0               MIT           yes              
guzzlehttp/guzzle                              6.3.0               MIT           yes              
guzzlehttp/promises                            v1.3.1              MIT           yes              
guzzlehttp/psr7                                1.4.2               MIT           yes              
jms/metadata                                   1.6.0               Apache-2.0    yes              
jms/parser-lib                                 1.0.0               Apache2       yes              
jms/serializer                                 1.9.1               Apache-2.0    yes              
justinrainbow/json-schema                      5.2.6               MIT           yes              
knplabs/console-service-provider               v2.1.0              MIT           yes              
league/commonmark                              0.17.0              BSD-3-Clause  yes              
metasyntactical/composer-plugin-license-check  v0.1.0              MIT           yes              
mindplay/composer-locator                      2.1.3               MIT           yes              
monolog/monolog                                1.23.0              MIT           yes              
mtdowling/jmespath.php                         2.4.0               MIT           yes              
myclabs/deep-copy                              1.7.0               MIT           yes              
ocramius/package-versions                      1.1.3               MIT           yes              
paragonie/random_compat                        v2.0.11             MIT           yes              
phpcollection/phpcollection                    0.5.0               Apache2       yes              
phpdocumentor/reflection-common                1.0.1               MIT           yes              
phpdocumentor/reflection-docblock              4.1.1               MIT           yes              
phpdocumentor/type-resolver                    0.4.0               MIT           yes              
phpoption/phpoption                            1.5.0               Apache2       yes              
phpspec/prophecy                               v1.7.2              MIT           yes              
phpunit/php-code-coverage                      4.0.8               BSD-3-Clause  yes              
phpunit/php-file-iterator                      1.4.2               BSD-3-Clause  yes              
phpunit/php-text-template                      1.2.1               BSD-3-Clause  yes              
phpunit/php-timer                              1.0.9               BSD-3-Clause  yes              
phpunit/php-token-stream                       2.0.1               BSD-3-Clause  yes              
phpunit/phpunit                                5.7.25              BSD-3-Clause  yes              
phpunit/phpunit-mock-objects                   3.4.4               BSD-3-Clause  yes              
pimple/pimple                                  v3.2.2              MIT           yes              
psr/container                                  1.0.0               MIT           yes              
psr/http-message                               1.0.1               MIT           yes              
psr/log                                        1.0.2               MIT           yes              
sebastian/code-unit-reverse-lookup             1.0.1               BSD-3-Clause  yes              
sebastian/comparator                           1.2.4               BSD-3-Clause  yes              
sebastian/diff                                 1.4.3               BSD-3-Clause  yes              
sebastian/environment                          2.0.0               BSD-3-Clause  yes              
sebastian/exporter                             2.0.0               BSD-3-Clause  yes              
sebastian/global-state                         1.1.1               BSD-3-Clause  yes              
sebastian/object-enumerator                    2.0.1               BSD-3-Clause  yes              
sebastian/recursion-context                    2.0.0               BSD-3-Clause  yes              
sebastian/resource-operations                  1.0.0               BSD-3-Clause  yes              
sebastian/version                              2.0.1               BSD-3-Clause  yes              
silex/silex                                    v2.2.0              MIT           yes              
symfony/browser-kit                            v3.3.13             MIT           yes              
symfony/console                                v3.3.13             MIT           yes              
symfony/debug                                  v3.4.1              MIT           yes              
symfony/dom-crawler                            v3.3.13             MIT           yes              
symfony/event-dispatcher                       v3.4.1              MIT           yes              
symfony/filesystem                             v3.4.3              MIT           yes              
symfony/http-foundation                        v3.4.1              MIT           yes              
symfony/http-kernel                            v3.4.1              MIT           yes              
symfony/inflector                              v3.3.13             MIT           yes              
symfony/polyfill-mbstring                      v1.6.0              MIT           yes              
symfony/polyfill-php70                         v1.6.0              MIT           yes              
symfony/property-access                        v3.3.13             MIT           yes              
symfony/psr-http-message-bridge                v1.0.2              MIT           yes              
symfony/routing                                v3.3.14             MIT           yes              
symfony/serializer                             v3.3.13             MIT           yes              
symfony/yaml                                   v3.3.13             MIT           yes              
webmozart/assert                               1.2.0               MIT           yes              
willdurand/negotiation                         v2.3.1              MIT           yes              
zendframework/zend-diactoros                   1.6.1               BSD-2-Clause  yes              

@lsh-0
Copy link
Contributor

lsh-0 commented Feb 1, 2018

that's really great actually. I wonder if the Python world has some hidden built in like that I haven't seen before ..

@seanwiseman
Copy link
Contributor

seanwiseman commented Feb 2, 2018

Just had a play around, I have been able to generate the same table for a python project using the pkg_resources library (it's in the standard library, so no additional dependencies 👍 ). If we want to go down this route I could formalize it (would not take long) and we could test it out. With thoughts being we would have a file containing allow license types for comparison.

I could also have it as an option part of the proofreader to warn of issues whilst linting. Just an idea.

@lsh-0
Copy link
Contributor

lsh-0 commented Feb 4, 2018

I prefer integrating it into the proofreader to be honest, at least for the python apps.

@giorgiosironi
Copy link
Contributor

I have been able to generate the same table for a python project using the pkg_resources library
I prefer integrating it into the proofreader to be honest, at least for the python apps.

The only caveat is how long it takes, if it's fast enough to be included in every build (which is where proofreader is executed), 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants