Merge pull request #691 from datalad-handbook/print-adjustment-3onwards

Print adjustment chapter 3 onwards
datalad-handbook · Mar 20, 2021 · de87a79 · de87a79
2 parents dfd5509 + bea4e85
commit de87a79
Show file tree

Hide file tree

Showing 16 changed files with 157 additions and 159 deletions.
diff --git a/docs/basics/101-122-config.rst b/docs/basics/101-122-config.rst
@@ -180,9 +180,8 @@ file, you would replace ``--add`` with ``--replace-all`` such as in::
    git config --local --replace-all core.editor "vim"
 
 to configure :term:`vim` to be your default editor.
-
-(Note that while being a good toy example, it is not a common thing to
-configure repository-specific editors)
+Note that while being a good toy example, it is not a common thing to
+configure repository-specific editors.
 
 This example demonstrated the structure of a :command:`git config`
 command. By specifying the ``name`` option with ``section.variable``
@@ -192,7 +191,7 @@ a value, one can configure Git, git-annex, and DataLad.
 of Git, depending on the scope (local, global, system-wide)
 specified in the command.
 
-.. find-out-more:: If things go wrong
+.. find-out-more:: If things go wrong during Git config
 
    If something goes wrong during the :command:`git config` command,
    for example you end up having two keys of the same name because you
@@ -245,16 +244,18 @@ or values in there are irrelevant for understanding the book, your dataset,
 or DataLad, and can just be left as they are. The previous section merely served
 to de-mystify the :command:`git config` command and the configuration files.
 Nevertheless, it might be helpful to get an overview about the meaning of the
-remaining sections in that file, and the following hidden section can give
-you a glimpse of this.
+remaining sections in that file, and the :ref:`that disects this config file further <fom_gitconfig>`  can give you a glimpse of this.
 
-.. find-out-more:: More on this config file
+.. find-out-more:: Disecting a Git config file further
+   :name: fom_gitconfig
+   :float:
 
+   Let's walk through the Git config file of ``DataLad-101``:
    The second section of ``.git/config`` is a git-annex configuration.
    As mentioned above, git-annex will use the
    :term:`Git config file` for some of its configurations.
    For example, it lists the repository as a
-   "version 5 repository", and gives the dataset its own git-annex
+   "version 8 repository", and gives the dataset its own git-annex
    UUID. While the "annex-uuid" [#f4]_ looks like yet another cryptic
    random string of characters, you have seen a UUID like this before:
    A :command:`git annex whereis` displays information about where the

diff --git a/docs/basics/101-123-config2.rst b/docs/basics/101-123-config2.rst
@@ -24,18 +24,6 @@ section by looking into it.
 
 This file lies right in the root of your superdataset:
 
-.. windows-wit:: Your file contents are slightly different
-
-   Windows users that did not use the custom :term:`git-annex` installer from `http://datasets.datalad.org/datalad/packages/windows/ <http://datasets.datalad.org/datalad/packages/windows/>`_ had to modify the ``.gitattributes`` file at the start of the Basics.
-   Instead of a line that contains "``mimencoding``", There should be the following two lines::
-
-      *.txt annex.largefiles=nothing
-      code/** annex.largefiles=nothing
-
-   This workaround was necessary because the default way of identifying file types (such as text files or binary files) as implemented by the ``text2git`` configuration option relies on a process called "mimeencoding" [#f1]_ -- a method to identify file types without only looking at their extension.
-   Windows does not support mimeencoding, and hence we had to explicitly add directories or file extensions for files that should not get annexed.
-   Please read on for more insights into the largefiles rules in ``.gitattributes``.
-
 .. runrecord:: _examples/DL-101-123-101
    :language: console
    :workdir: dl-101/DataLad-101
@@ -457,24 +445,24 @@ configuration option thus is the environment variable ``DATALAD_LOG_LEVEL``.
 .. find-out-more:: Some more general information on environment variables
    :name: fom-envvar
 
-     Names of environment variables are often all-uppercase. While the ``$`` is not part of
-     the name of the environment variable, it is necessary to *refer* to the environment
-     variable: To reference the value of the environment variable ``HOME`` for example you would
-     need to use ``echo $HOME`` and not ``echo HOME``. However, environment variables are
-     set without a leading ``$``. There are several ways to set an environment variable
-     (note that there are no spaces before and after the ``=`` !), leading to different
-     levels of availability of the variable:
-
-     - ``THEANSWER=42 <command>`` makes the variable ``THEANSWER`` available for the process in ``<command>``.
-       For example, ``DATALAD_LOG_LEVEL=debug datalad get <file>`` will execute the :command:`datalad get`
-       command (and only this one) with the log level set to "debug".
-     - ``export THEANSWER=42`` makes the variable ``THEANSWER`` available for other processes in the
-       same session, but it will not be available to other shells.
-     - ``echo 'export THEANSWER=42' >> ~/.bashrc`` will write the variable definition in the
-       ``.bashrc`` file and thus available to all future shells of the user (i.e., this will make
-       the variable permanent for the user)
-
-     To list all of the configured environment variables, type ``env`` into your terminal.
+   Names of environment variables are often all-uppercase. While the ``$`` is not part of
+   the name of the environment variable, it is necessary to *refer* to the environment
+   variable: To reference the value of the environment variable ``HOME`` for example you would
+   need to use ``echo $HOME`` and not ``echo HOME``. However, environment variables are
+   set without a leading ``$``. There are several ways to set an environment variable
+   (note that there are no spaces before and after the ``=`` !), leading to different
+   levels of availability of the variable:
+
+   - ``THEANSWER=42 <command>`` makes the variable ``THEANSWER`` available for the process in ``<command>``.
+     For example, ``DATALAD_LOG_LEVEL=debug datalad get <file>`` will execute the :command:`datalad get`
+     command (and only this one) with the log level set to "debug".
+   - ``export THEANSWER=42`` makes the variable ``THEANSWER`` available for other processes in the
+     same session, but it will not be available to other shells.
+   - ``echo 'export THEANSWER=42' >> ~/.bashrc`` will write the variable definition in the
+     ``.bashrc`` file and thus available to all future shells of the user (i.e., this will make
+     the variable permanent for the user)
+
+   To list all of the configured environment variables, type ``env`` into your terminal.
 
 
 Summary

diff --git a/docs/basics/101-124-procedures.rst b/docs/basics/101-124-procedures.rst
@@ -28,11 +28,6 @@ nothing more than a simple script that
 - writes the relevant configuration (``annex_largefiles = '((mimeencoding=binary)and(largerthan=0))'``,  i.e., "Do not put anything that is a text file in the annex") to the ``.gitattributes`` file of a dataset, and
 - saves this modification with the commit message "Instruct annex to add text files to Git".
 
-.. windows-wit:: Why this configuration does not work for Windows users
-
-   If you're on a **Windows 10** machine with a **native** (i.e., non :term:`WSL` based installation) of DataLad and did **not** use the custom :term:`git-annex` installer from `http://datasets.datalad.org/datalad/packages/windows/ <http://datasets.datalad.org/datalad/packages/windows/>`_ at the start of the Basics, the ``text2git`` configuration will lead to errors upon a :command:`datalad save`.
-   This is because MagicMime (used in ``mimeencoding=binary`` to determine the file type of any given file by searching for `magic numbers <https://en.wikipedia.org/wiki/List_of_file_signatures>`_) is not natively available on Windows.
-
 This particular procedure lives in a script called
 ``cfg_text2git`` in the sourcecode of DataLad. The amount of code
 in this script is not large, and the relevant lines of code
@@ -75,10 +70,11 @@ only modify ``.gitattributes``, but can also populate a dataset
 with particular content, or automate routine tasks such as
 synchronizing dataset content with certain siblings.
 What makes them a particularly versatile and flexible tool is
-that anyone can write their own procedures (find a tutorial :ref:`here <fom-procedures>`. If a workflow is
-a standard in a team and needs to be applied often, turning it into
-a script can save time and effort. By pointing DataLad
-to the location the procedures reside in they can be applied, and by
+that anyone can write their own procedures.
+If a workflow is a standard in a team and needs to be applied often, turning it into
+a script can save time and effort.
+To learn how to do this, read the :ref:`with a tutorial on writing own procedures <fom-procedures>`.
+By pointing DataLad to the location the procedures reside in they can be applied, and by
 including them in a dataset they can even be shared.
 And even if the script is simple, it is very handy to have preconfigured
 procedures that can be run in a single command line call. In the

diff --git a/docs/basics/101-127-yoda.rst b/docs/basics/101-127-yoda.rst
@@ -142,8 +142,7 @@ computational environments, results, ...) in dedicated directories. For example:
   project for each analysis, instead of conflating them.
 
 This, for example, would be a directory structure from the root of a
-superdataset of a very comprehensive [#f3]_
-data analysis project complying to the YODA principles:
+superdataset of a very comprehensive data analysis project complying to the YODA principles:
 
 .. code-block:: bash
 
@@ -171,6 +170,34 @@ data analysis project complying to the YODA principles:
     ├── HOWTO.md
     └── README.md
 
+You can get a few non-DataLad related advice for structuring your directories in the :ref:`on best practices for analysis organization <fom-yodaproject>`.
+
+.. find-out-more:: More best practices for organizing contents in directories
+   :name: fom-yodaproject
+   :float:
+
+   The exemplary YODA directory structure is very comprehensive, and displays many best-practices for
+   reproducible data science. For example,
+
+   #. Within ``code/``, it is best practice to add **tests** for the code.
+      These tests can be run to check whether the code still works.
+
+   #. It is even better to further use automated computing, for example
+      `continuous integration (CI) systems <https://en.wikipedia.org/wiki/Continuous_integration>`_,
+      to test the functionality of your functions and scripts automatically.
+      If relevant, the setup for continuous integration frameworks (such as
+      `Travis <https://travis-ci.org>`_) lives outside of ``code/``,
+      in a dedicated ``ci/`` directory.
+
+   #. Include **documents for fellow humans**: Notes in a README.md or a HOWTO.md,
+      or even proper documentation (for example using  in a dedicated ``docs/`` directory.
+      Within these documents, include all relevant metadata for your analysis. If you are
+      conducting a scientific study, this might be authorship, funding,
+      change log, etc.
+
+   If writing tests for analysis scripts or using continuous integration
+   is a new idea for you, but you want to learn more, check out
+   `this chapter on testing <https://the-turing-way.netlify.com/testing/testing.html>`_.
 
 There are many advantages to this modular way of organizing contents.
 Having input data as independent components that are not altered (only
@@ -329,7 +356,7 @@ and by means of which command.
 
 With another DataLad command one can even go one step further:
 The command :command:`datalad containers-run` (it will be introduced in
-a later part of the book) performs a command execution within
+section :ref:`containersrun`) performs a command execution within
 a configured containerized environment. Thus, not only inputs,
 outputs, command, time, and author, but also the *software environment*
 are captured as provenance of a dataset component such as a results file,
@@ -427,30 +454,6 @@ YODA principles.
          a comprehensive guide to reproducible data science, or read about it in
          section :ref:`containersrun`.
 
-.. [#f3] This directory structure is very comprehensive, and displays many best-practices for
-         reproducible data science. For example,
-
-            #. Within ``code/``, it is best practice to add **tests** for the code.
-               These tests can be run to check whether the code still works.
-
-            #. It is even better to further use automated computing, for example
-               `continuous integration (CI) systems <https://en.wikipedia.org/wiki/Continuous_integration>`_,
-               to test the functionality of your functions and scripts automatically.
-               If relevant, the setup for continuous integration frameworks (such as
-               `Travis <https://travis-ci.org>`_) lives outside of ``code/``,
-               in a dedicated ``ci/`` directory.
-
-            #. Include **documents for fellow humans**: Notes in a README.md or a HOWTO.md,
-               or even proper documentation (for example using  in a dedicated ``docs/`` directory.
-               Within these documents, include all relevant metadata for your analysis. If you are
-               conducting a scientific study, this might be authorship, funding,
-               change log, etc.
-
-         If writing tests for analysis scripts or using continuous integration
-         is a new idea for you, but you want to learn more, check out
-         `this excellent chapter on testing <https://the-turing-way.netlify.com/testing/testing.html#Acceptance_testing>`_
-         in the book `The Turing Way <https://the-turing-way.netlify.com/introduction/introduction>`_.
-
 .. [#f4] Substitute unfeasible with *wasteful*, *impractical*, or simply *stupid* if preferred.
 
 .. [#f5] To re-read how ``.gitattributes`` work, go back to section :ref:`config`, and to remind yourself

diff --git a/docs/basics/101-130-yodaproject.rst b/docs/basics/101-130-yodaproject.rst
@@ -12,7 +12,7 @@ In principle, you can prepare YODA-compliant data analyses in any programming
 language of your choice. But because you are already familiar with
 the `Python <https://www.python.org/>`__ programming language, you decide
 to script your analysis in Python. Delighted, you find out that there is even
-a Python API for DataLad's functionality that you can read about in :ref:`a Findoutmore <fom-pythonapi>`.
+a Python API for DataLad's functionality that you can read about in :ref:`a Findoutmore on DataLad in Python<fom-pythonapi>`.
 
 .. find-out-more:: DataLad's Python API
    :name: fom-pythonapi
@@ -29,11 +29,6 @@ a Python API for DataLad's functionality that you can read about in :ref:`a Find
     to the command line, and it is immensely useful when creating reproducible
     data analyses."
 
-    This short section will give you an overview on DataLad's Python API and explore
-    how to make use of it in an analysis project. Together with the previous
-    section on the YODA principles, it is a good basis for a data analysis midterm project
-    in Python.
-
     All of DataLad's user-oriented commands are exposed via ``datalad.api``.
     Thus, any command can be imported as a stand-alone command like this::
 
@@ -130,7 +125,7 @@ of the flowers in centimeters. It is often used in introductory data science
 courses for statistical classification techniques in machine learning, and
 widely available -- a perfect dataset for your midterm project!
 
-.. admonition:: Turn data analysis into dynamically generated documents
+.. importantnote:: Turn data analysis into dynamically generated documents
 
    Beyond the contents of this section, we have transformed the example analysis also into a template to write a reproducible paper, following the use case :ref:`usecase_reproducible_paper`.
    If you're interested in checking that out, please head over to `github.com/datalad-handbook/repro-paper-sketch/ <https://github.com/datalad-handbook/repro-paper-sketch/>`_.
@@ -445,7 +440,7 @@ re-execution with :command:`datalad rerun` easy.
    snippets to copy and paste. However, if you do not want to install any
    Python packages, do not execute the remaining code examples in this section
    -- an upcoming section on ``datalad containers-run`` will allow you to
-   perform the analysis without changing with your Python software-setup.
+   perform the analysis without changing your Python software-setup.
 
 .. windows-wit:: You may need to use "python", not "python3"
 
@@ -645,7 +640,7 @@ The command takes a repository name and GitHub authentication credentials
 ``github-passwd <PASSWORD>``, with an *oauth* `token <https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token>`_ stored in the Git
 configuration, or interactively).
 
-.. admonition:: GitHub deprecates its User Password authentication
+.. importantnote:: GitHub deprecates its User Password authentication
 
    GitHub `decided to deprecate user-password authentication <https://developer.github.com/changes/2020-02-14-deprecating-password-auth/>`_ and will only support authentication via personal access token from November 13th 2020 onwards.
    Upcoming changes in DataLad's API will reflect this change starting with DataLad version ``0.13.6`` by removing the ``github-passwd`` argument.
@@ -673,7 +668,7 @@ configure this repository as a sibling of the dataset:
 
 .. windows-wit:: Your shell will not display credentials
 
-   Don't be confused if you are prompted for your GitHub credentials, but can't seem to type -- The terminal protects your private information by not displaying what you type.
+   Don't be confused if you are prompted for your GitHub credentials, but can't seem to type -- the terminal protects your private information by not displaying what you type.
    Simply type in what is requested, and press enter.
 
 .. code-block:: bash
@@ -714,7 +709,7 @@ state of the dataset to this :term:`sibling` with the :command:`datalad push`
     proportion of the previous handbook content as a prerequisite. In order to be
     not too overwhelmingly detailed, the upcoming sections will approach
     :command:`push` from a "learning-by-doing" perspective:
-    You will see a first :command:`push` to GitHub below, and the findoutmore at
+    You will see a first :command:`push` to GitHub below, and the :ref: find-out-more on the published dataset <fom-midtermclone>`
     the end of this section will already give a practical glimpse into the
     difference between annexed contents and contents stored in Git when pushed
     to GitHub. The chapter :ref:`chapter_thirdparty` will extend on this,
@@ -892,11 +887,11 @@ reproduce your data science project easily from scratch (take a look into the :r
          configured your dataset. If you want to re-read the full chapter on
          configurations and run-procedures, start with section :ref:`config`.
 
-.. [#f5] Instead of using GitHub's WebUI you could also obtain a token using the command line GitHub interface (https://github.com/sociomantic/git-hub) by running ``git hub setup`` (if no 2FA is used).
+.. [#f5] Instead of using GitHub's WebUI you could also obtain a token using the command line GitHub interface (https://github.com/sociomantic-tsunami/git-hub) by running ``git hub setup`` (if no 2FA is used).
          If you decide to use the command line interface, here is help on how to use it:
-         Clone the `GitHub repository <https://github.com/sociomantic/git-hub>`_ to your local computer.
+         Clone the `GitHub repository <https://github.com/sociomantic-tsunami/git-hub>`_ to your local computer.
          Decide whether you want to build a Debian package to install, or install the single-file Python script distributed in the repository.
-         Make sure that all `requirements <https://github.com/sociomantic-tsunami/git-hub#dependencies>`_ for your preferred version are installed , and run either ``make deb`` followed by ``sudo dpkg -i deb/git-hub*all.deb`` or ``make install``.
+         Make sure that all `requirements <https://github.com/sociomantic-tsunami/git-hub#dependencies>`_ for your preferred version are installed , and run either ``make deb`` followed by ``sudo dpkg -i deb/git-hub*all.deb``, or ``make install``.
 
 .. [#f6] Note that this is a :command:`git push`, not :command:`datalad push`.
          Tags could be pushed upon a :command:`datalad push`, though, if one