Smooth out "getting started" in the README #33

Ostrzyciel · 2024-04-22T09:29:51Z

Currently the README does not really describe "how to get started with this", but rather offers a comprehensive set of instructions for doing X, Y, Z, and so on. This requires the reader to FIRST read all of this, read the Makefile, understand the complex network of dependencies, get some datasets to process, etc., and THEN run it. This makes it hard for external users to quickly evaluate if the software works.

Here I'm proposing a set of improvements to the README that would make this much more streamlined and ensure that users can really "get started" easily.

Provide a pre-built Docker image for each tool. Do not require the user to build it.
Add a "Getting started" section at the start of the readme (before installation).
- Specify a minimal set of requirements (for example: any Linux with Docker). Do not require something weird like a Go compiler. Non-developer users should never be required to install a compiler.
- Focus on ONE path of installation/execution. Something like: "The quickest way to get started is to use the pre-built Docker image. You can simply run it like so:"
- In the getting started guide NEVER require the user to go to external sources – for example, Docker documentation, some Makefile stored somewhere, some documentation page elsewhere, other repo... This is usually a major source of frustration and it makes the process much longer.
- In the guide include a few one-liner commands that just work and describe simply what they do and where the user can find the results. For example, a docker run command that processes some files.
- Absolutely crucial: include some example data to for the users to process. You can simply attach extra files to a release on GitHub (see here – there is a box for dropping files below the release description) and then just link to them (they will have permanent URLs after upload). In the guide add, for example, some wget one-liner to fetch this data.
- In the end the user should be required to copy-paste 2–3 shell commands to get some colorful, nice output. Instant gratification. The software just works. ❇️ ✨ 🌈 :)

The text was updated successfully, but these errors were encountered:

Kaszanas · 2024-04-22T15:37:45Z

Provide a pre-built Docker image for each tool. Do not require the user to build it.

I think that originally README was meant to be a little more of a developer documentation file. pre-built docker image for each tool, or one common docker image will be a part of the CI. If the docker image cannot be built the PR cannot be merged.

Specify a minimal set of requirements (for example: any Linux with Docker). Do not require something weird like a Go compiler. Non-developer users should never be required to install a compiler.

Well this depends entirely on the individual setup that the end-user would like to have. By default all of these tools can be used as Python scripts or through a provided Docker image. There won't be any specific requirements as we assume that this software should work on all systems. The only issue is to be able to test it effectively on different OS (I don't know if we can introduce that to the CI?)

https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#using-the-default-python-version

Currently all of the tests are dockerized. So we can either choose to try and support different OS, or assume that if the Docker is working then all of the tools will be working.

Focus on ONE path of installation/execution. Something like: "The quickest way to get started is to use the pre-built Docker image. You can simply run it like so:"

Agreed. Do you have any opinions on that? pip installable vs Docker?

In the getting started guide NEVER require the user to go to external sources – for example, Docker documentation, some Makefile stored somewhere, some documentation page elsewhere, other repo... This is usually a major source of frustration and it makes the process much longer.

Agreed. In the end all of these tools should be available through this repository which will be a base: https://github.com/Kaszanas/SC2Tools

In the guide include a few one-liner commands that just work and describe simply what they do and where the user can find the results. For example, a docker run command that processes some files.

Agreed. Each of the tools should have separate commands in this case. Finally, there should be one command (pipeline) that is able to reproduce the output in the form of a final dataset for people to use. All of the tools in this repository were build to assist with some task around preparing a dataset.

Absolutely crucial: include some example data to for the users to process. You can simply attach extra files to a release on GitHub (see here – there is a box for dropping files below the release description) and then just link to them (they will have permanent URLs after upload). In the guide add, for example, some wget one-liner to fetch this data.

Adding to a release manually every time will be a pain. I think in this case there should be some other solution that would require minimal oversight.

In the end the user should be required to copy-paste 2–3 shell commands to get some colorful, nice output. Instant gratification. The software just works. ❇️ ✨ 🌈 :)

Fully agreed. I'd even set on having one command that takes an input and produces an output that contains a final dataset that can be used with this: https://github.com/Kaszanas/SC2_Datasets

Finally

Would you be able to contribute some of these solutions? Do we split these between ourselves?

Ostrzyciel · 2024-04-25T12:23:34Z

I think that originally README was meant to be a little more of a developer documentation file. pre-built docker image for each tool, or one common docker image will be a part of the CI. If the docker image cannot be built the PR cannot be merged.

I meant that the Docker image should be pushed to some public repository from where the user can download it. I meant that in the README this should be the default option for using the Docker image (not building manually).

Well this depends entirely on the individual setup that the end-user would like to have. By default all of these tools can be used as Python scripts or through a provided Docker image. There won't be any specific requirements as we assume that this software should work on all systems. The only issue is to be able to test it effectively on different OS (I don't know if we can introduce that to the CI?)

https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python#using-the-default-python-version

Currently all of the tests are dockerized. So we can either choose to try and support different OS, or assume that if the Docker is working then all of the tools will be working.

Things like "linux" or "docker" are also requirements, even when they are obvious ;)

Testing under Windows and even macOS (including macOS on ARM) can be done with GitHub's runners: https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners It just needs some configuration and testing...

Agreed. Do you have any opinions on that? pip installable vs Docker?

Docker! :) :) :)

Adding to a release manually every time will be a pain. I think in this case there should be some other solution that would require minimal oversight.

You don't have to add it to each release (but if you wanted to, you CAN automate it). Simply create manually a release named "example-data" or whatever and upload the dataset there. I do that in RiverBench dataset repositories, where each repo has a "source" release which stores some big source file; see for example: https://github.com/RiverBench/dataset-muziekweb/releases/tag/source

In the end the Releases feature of GitHub is a nice wrapper around S3 and... you can still use it as S3 ;)

Finally

Would you be able to contribute some of these solutions? Do we split these between ourselves?

Some – yes, but I cannot oversee it. Please assign me a specific task and I'll do my best.

Ostrzyciel added the documentation Improvements or additions to documentation label Apr 22, 2024

Ostrzyciel assigned Kaszanas Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smooth out "getting started" in the README #33

Smooth out "getting started" in the README #33

Ostrzyciel commented Apr 22, 2024

Kaszanas commented Apr 22, 2024

Ostrzyciel commented Apr 25, 2024

Finally

Smooth out "getting started" in the README #33

Smooth out "getting started" in the README #33

Comments

Ostrzyciel commented Apr 22, 2024

Kaszanas commented Apr 22, 2024

Finally

Ostrzyciel commented Apr 25, 2024

Finally