Skip to content

Commit

Permalink
Merge pull request #691 from datalad-handbook/print-adjustment-3onwards
Browse files Browse the repository at this point in the history
Print adjustment chapter 3 onwards
  • Loading branch information
adswa committed Mar 20, 2021
2 parents dfd5509 + bea4e85 commit de87a79
Show file tree
Hide file tree
Showing 16 changed files with 157 additions and 159 deletions.
17 changes: 9 additions & 8 deletions docs/basics/101-122-config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,8 @@ file, you would replace ``--add`` with ``--replace-all`` such as in::
git config --local --replace-all core.editor "vim"

to configure :term:`vim` to be your default editor.

(Note that while being a good toy example, it is not a common thing to
configure repository-specific editors)
Note that while being a good toy example, it is not a common thing to
configure repository-specific editors.

This example demonstrated the structure of a :command:`git config`
command. By specifying the ``name`` option with ``section.variable``
Expand All @@ -192,7 +191,7 @@ a value, one can configure Git, git-annex, and DataLad.
of Git, depending on the scope (local, global, system-wide)
specified in the command.

.. find-out-more:: If things go wrong
.. find-out-more:: If things go wrong during Git config

If something goes wrong during the :command:`git config` command,
for example you end up having two keys of the same name because you
Expand Down Expand Up @@ -245,16 +244,18 @@ or values in there are irrelevant for understanding the book, your dataset,
or DataLad, and can just be left as they are. The previous section merely served
to de-mystify the :command:`git config` command and the configuration files.
Nevertheless, it might be helpful to get an overview about the meaning of the
remaining sections in that file, and the following hidden section can give
you a glimpse of this.
remaining sections in that file, and the :ref:`that disects this config file further <fom_gitconfig>` can give you a glimpse of this.

.. find-out-more:: More on this config file
.. find-out-more:: Disecting a Git config file further
:name: fom_gitconfig
:float:

Let's walk through the Git config file of ``DataLad-101``:
The second section of ``.git/config`` is a git-annex configuration.
As mentioned above, git-annex will use the
:term:`Git config file` for some of its configurations.
For example, it lists the repository as a
"version 5 repository", and gives the dataset its own git-annex
"version 8 repository", and gives the dataset its own git-annex
UUID. While the "annex-uuid" [#f4]_ looks like yet another cryptic
random string of characters, you have seen a UUID like this before:
A :command:`git annex whereis` displays information about where the
Expand Down
48 changes: 18 additions & 30 deletions docs/basics/101-123-config2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,6 @@ section by looking into it.

This file lies right in the root of your superdataset:

.. windows-wit:: Your file contents are slightly different

Windows users that did not use the custom :term:`git-annex` installer from `http://datasets.datalad.org/datalad/packages/windows/ <http://datasets.datalad.org/datalad/packages/windows/>`_ had to modify the ``.gitattributes`` file at the start of the Basics.
Instead of a line that contains "``mimencoding``", There should be the following two lines::

*.txt annex.largefiles=nothing
code/** annex.largefiles=nothing

This workaround was necessary because the default way of identifying file types (such as text files or binary files) as implemented by the ``text2git`` configuration option relies on a process called "mimeencoding" [#f1]_ -- a method to identify file types without only looking at their extension.
Windows does not support mimeencoding, and hence we had to explicitly add directories or file extensions for files that should not get annexed.
Please read on for more insights into the largefiles rules in ``.gitattributes``.

.. runrecord:: _examples/DL-101-123-101
:language: console
:workdir: dl-101/DataLad-101
Expand Down Expand Up @@ -457,24 +445,24 @@ configuration option thus is the environment variable ``DATALAD_LOG_LEVEL``.
.. find-out-more:: Some more general information on environment variables
:name: fom-envvar

Names of environment variables are often all-uppercase. While the ``$`` is not part of
the name of the environment variable, it is necessary to *refer* to the environment
variable: To reference the value of the environment variable ``HOME`` for example you would
need to use ``echo $HOME`` and not ``echo HOME``. However, environment variables are
set without a leading ``$``. There are several ways to set an environment variable
(note that there are no spaces before and after the ``=`` !), leading to different
levels of availability of the variable:

- ``THEANSWER=42 <command>`` makes the variable ``THEANSWER`` available for the process in ``<command>``.
For example, ``DATALAD_LOG_LEVEL=debug datalad get <file>`` will execute the :command:`datalad get`
command (and only this one) with the log level set to "debug".
- ``export THEANSWER=42`` makes the variable ``THEANSWER`` available for other processes in the
same session, but it will not be available to other shells.
- ``echo 'export THEANSWER=42' >> ~/.bashrc`` will write the variable definition in the
``.bashrc`` file and thus available to all future shells of the user (i.e., this will make
the variable permanent for the user)

To list all of the configured environment variables, type ``env`` into your terminal.
Names of environment variables are often all-uppercase. While the ``$`` is not part of
the name of the environment variable, it is necessary to *refer* to the environment
variable: To reference the value of the environment variable ``HOME`` for example you would
need to use ``echo $HOME`` and not ``echo HOME``. However, environment variables are
set without a leading ``$``. There are several ways to set an environment variable
(note that there are no spaces before and after the ``=`` !), leading to different
levels of availability of the variable:

- ``THEANSWER=42 <command>`` makes the variable ``THEANSWER`` available for the process in ``<command>``.
For example, ``DATALAD_LOG_LEVEL=debug datalad get <file>`` will execute the :command:`datalad get`
command (and only this one) with the log level set to "debug".
- ``export THEANSWER=42`` makes the variable ``THEANSWER`` available for other processes in the
same session, but it will not be available to other shells.
- ``echo 'export THEANSWER=42' >> ~/.bashrc`` will write the variable definition in the
``.bashrc`` file and thus available to all future shells of the user (i.e., this will make
the variable permanent for the user)

To list all of the configured environment variables, type ``env`` into your terminal.


Summary
Expand Down
14 changes: 5 additions & 9 deletions docs/basics/101-124-procedures.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,6 @@ nothing more than a simple script that
- writes the relevant configuration (``annex_largefiles = '((mimeencoding=binary)and(largerthan=0))'``, i.e., "Do not put anything that is a text file in the annex") to the ``.gitattributes`` file of a dataset, and
- saves this modification with the commit message "Instruct annex to add text files to Git".

.. windows-wit:: Why this configuration does not work for Windows users

If you're on a **Windows 10** machine with a **native** (i.e., non :term:`WSL` based installation) of DataLad and did **not** use the custom :term:`git-annex` installer from `http://datasets.datalad.org/datalad/packages/windows/ <http://datasets.datalad.org/datalad/packages/windows/>`_ at the start of the Basics, the ``text2git`` configuration will lead to errors upon a :command:`datalad save`.
This is because MagicMime (used in ``mimeencoding=binary`` to determine the file type of any given file by searching for `magic numbers <https://en.wikipedia.org/wiki/List_of_file_signatures>`_) is not natively available on Windows.

This particular procedure lives in a script called
``cfg_text2git`` in the sourcecode of DataLad. The amount of code
in this script is not large, and the relevant lines of code
Expand Down Expand Up @@ -75,10 +70,11 @@ only modify ``.gitattributes``, but can also populate a dataset
with particular content, or automate routine tasks such as
synchronizing dataset content with certain siblings.
What makes them a particularly versatile and flexible tool is
that anyone can write their own procedures (find a tutorial :ref:`here <fom-procedures>`. If a workflow is
a standard in a team and needs to be applied often, turning it into
a script can save time and effort. By pointing DataLad
to the location the procedures reside in they can be applied, and by
that anyone can write their own procedures.
If a workflow is a standard in a team and needs to be applied often, turning it into
a script can save time and effort.
To learn how to do this, read the :ref:`with a tutorial on writing own procedures <fom-procedures>`.
By pointing DataLad to the location the procedures reside in they can be applied, and by
including them in a dataset they can even be shared.
And even if the script is simple, it is very handy to have preconfigured
procedures that can be run in a single command line call. In the
Expand Down
57 changes: 30 additions & 27 deletions docs/basics/101-127-yoda.rst
Original file line number Diff line number Diff line change
Expand Up @@ -142,8 +142,7 @@ computational environments, results, ...) in dedicated directories. For example:
project for each analysis, instead of conflating them.

This, for example, would be a directory structure from the root of a
superdataset of a very comprehensive [#f3]_
data analysis project complying to the YODA principles:
superdataset of a very comprehensive data analysis project complying to the YODA principles:

.. code-block:: bash
Expand Down Expand Up @@ -171,6 +170,34 @@ data analysis project complying to the YODA principles:
├── HOWTO.md
└── README.md
You can get a few non-DataLad related advice for structuring your directories in the :ref:`on best practices for analysis organization <fom-yodaproject>`.

.. find-out-more:: More best practices for organizing contents in directories
:name: fom-yodaproject
:float:

The exemplary YODA directory structure is very comprehensive, and displays many best-practices for
reproducible data science. For example,

#. Within ``code/``, it is best practice to add **tests** for the code.
These tests can be run to check whether the code still works.

#. It is even better to further use automated computing, for example
`continuous integration (CI) systems <https://en.wikipedia.org/wiki/Continuous_integration>`_,
to test the functionality of your functions and scripts automatically.
If relevant, the setup for continuous integration frameworks (such as
`Travis <https://travis-ci.org>`_) lives outside of ``code/``,
in a dedicated ``ci/`` directory.

#. Include **documents for fellow humans**: Notes in a README.md or a HOWTO.md,
or even proper documentation (for example using in a dedicated ``docs/`` directory.
Within these documents, include all relevant metadata for your analysis. If you are
conducting a scientific study, this might be authorship, funding,
change log, etc.

If writing tests for analysis scripts or using continuous integration
is a new idea for you, but you want to learn more, check out
`this chapter on testing <https://the-turing-way.netlify.com/testing/testing.html>`_.

There are many advantages to this modular way of organizing contents.
Having input data as independent components that are not altered (only
Expand Down Expand Up @@ -329,7 +356,7 @@ and by means of which command.

With another DataLad command one can even go one step further:
The command :command:`datalad containers-run` (it will be introduced in
a later part of the book) performs a command execution within
section :ref:`containersrun`) performs a command execution within
a configured containerized environment. Thus, not only inputs,
outputs, command, time, and author, but also the *software environment*
are captured as provenance of a dataset component such as a results file,
Expand Down Expand Up @@ -427,30 +454,6 @@ YODA principles.
a comprehensive guide to reproducible data science, or read about it in
section :ref:`containersrun`.
.. [#f3] This directory structure is very comprehensive, and displays many best-practices for
reproducible data science. For example,
#. Within ``code/``, it is best practice to add **tests** for the code.
These tests can be run to check whether the code still works.
#. It is even better to further use automated computing, for example
`continuous integration (CI) systems <https://en.wikipedia.org/wiki/Continuous_integration>`_,
to test the functionality of your functions and scripts automatically.
If relevant, the setup for continuous integration frameworks (such as
`Travis <https://travis-ci.org>`_) lives outside of ``code/``,
in a dedicated ``ci/`` directory.
#. Include **documents for fellow humans**: Notes in a README.md or a HOWTO.md,
or even proper documentation (for example using in a dedicated ``docs/`` directory.
Within these documents, include all relevant metadata for your analysis. If you are
conducting a scientific study, this might be authorship, funding,
change log, etc.
If writing tests for analysis scripts or using continuous integration
is a new idea for you, but you want to learn more, check out
`this excellent chapter on testing <https://the-turing-way.netlify.com/testing/testing.html#Acceptance_testing>`_
in the book `The Turing Way <https://the-turing-way.netlify.com/introduction/introduction>`_.
.. [#f4] Substitute unfeasible with *wasteful*, *impractical*, or simply *stupid* if preferred.
.. [#f5] To re-read how ``.gitattributes`` work, go back to section :ref:`config`, and to remind yourself
Expand Down
23 changes: 9 additions & 14 deletions docs/basics/101-130-yodaproject.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ In principle, you can prepare YODA-compliant data analyses in any programming
language of your choice. But because you are already familiar with
the `Python <https://www.python.org/>`__ programming language, you decide
to script your analysis in Python. Delighted, you find out that there is even
a Python API for DataLad's functionality that you can read about in :ref:`a Findoutmore <fom-pythonapi>`.
a Python API for DataLad's functionality that you can read about in :ref:`a Findoutmore on DataLad in Python<fom-pythonapi>`.

.. find-out-more:: DataLad's Python API
:name: fom-pythonapi
Expand All @@ -29,11 +29,6 @@ a Python API for DataLad's functionality that you can read about in :ref:`a Find
to the command line, and it is immensely useful when creating reproducible
data analyses."

This short section will give you an overview on DataLad's Python API and explore
how to make use of it in an analysis project. Together with the previous
section on the YODA principles, it is a good basis for a data analysis midterm project
in Python.

All of DataLad's user-oriented commands are exposed via ``datalad.api``.
Thus, any command can be imported as a stand-alone command like this::

Expand Down Expand Up @@ -130,7 +125,7 @@ of the flowers in centimeters. It is often used in introductory data science
courses for statistical classification techniques in machine learning, and
widely available -- a perfect dataset for your midterm project!

.. admonition:: Turn data analysis into dynamically generated documents
.. importantnote:: Turn data analysis into dynamically generated documents

Beyond the contents of this section, we have transformed the example analysis also into a template to write a reproducible paper, following the use case :ref:`usecase_reproducible_paper`.
If you're interested in checking that out, please head over to `github.com/datalad-handbook/repro-paper-sketch/ <https://github.com/datalad-handbook/repro-paper-sketch/>`_.
Expand Down Expand Up @@ -445,7 +440,7 @@ re-execution with :command:`datalad rerun` easy.
snippets to copy and paste. However, if you do not want to install any
Python packages, do not execute the remaining code examples in this section
-- an upcoming section on ``datalad containers-run`` will allow you to
perform the analysis without changing with your Python software-setup.
perform the analysis without changing your Python software-setup.

.. windows-wit:: You may need to use "python", not "python3"

Expand Down Expand Up @@ -645,7 +640,7 @@ The command takes a repository name and GitHub authentication credentials
``github-passwd <PASSWORD>``, with an *oauth* `token <https://docs.github.com/en/free-pro-team@latest/github/authenticating-to-github/creating-a-personal-access-token>`_ stored in the Git
configuration, or interactively).

.. admonition:: GitHub deprecates its User Password authentication
.. importantnote:: GitHub deprecates its User Password authentication

GitHub `decided to deprecate user-password authentication <https://developer.github.com/changes/2020-02-14-deprecating-password-auth/>`_ and will only support authentication via personal access token from November 13th 2020 onwards.
Upcoming changes in DataLad's API will reflect this change starting with DataLad version ``0.13.6`` by removing the ``github-passwd`` argument.
Expand Down Expand Up @@ -673,7 +668,7 @@ configure this repository as a sibling of the dataset:

.. windows-wit:: Your shell will not display credentials

Don't be confused if you are prompted for your GitHub credentials, but can't seem to type -- The terminal protects your private information by not displaying what you type.
Don't be confused if you are prompted for your GitHub credentials, but can't seem to type -- the terminal protects your private information by not displaying what you type.
Simply type in what is requested, and press enter.

.. code-block:: bash
Expand Down Expand Up @@ -714,7 +709,7 @@ state of the dataset to this :term:`sibling` with the :command:`datalad push`
proportion of the previous handbook content as a prerequisite. In order to be
not too overwhelmingly detailed, the upcoming sections will approach
:command:`push` from a "learning-by-doing" perspective:
You will see a first :command:`push` to GitHub below, and the findoutmore at
You will see a first :command:`push` to GitHub below, and the :ref: find-out-more on the published dataset <fom-midtermclone>`
the end of this section will already give a practical glimpse into the
difference between annexed contents and contents stored in Git when pushed
to GitHub. The chapter :ref:`chapter_thirdparty` will extend on this,
Expand Down Expand Up @@ -892,11 +887,11 @@ reproduce your data science project easily from scratch (take a look into the :r
configured your dataset. If you want to re-read the full chapter on
configurations and run-procedures, start with section :ref:`config`.
.. [#f5] Instead of using GitHub's WebUI you could also obtain a token using the command line GitHub interface (https://github.com/sociomantic/git-hub) by running ``git hub setup`` (if no 2FA is used).
.. [#f5] Instead of using GitHub's WebUI you could also obtain a token using the command line GitHub interface (https://github.com/sociomantic-tsunami/git-hub) by running ``git hub setup`` (if no 2FA is used).
If you decide to use the command line interface, here is help on how to use it:
Clone the `GitHub repository <https://github.com/sociomantic/git-hub>`_ to your local computer.
Clone the `GitHub repository <https://github.com/sociomantic-tsunami/git-hub>`_ to your local computer.
Decide whether you want to build a Debian package to install, or install the single-file Python script distributed in the repository.
Make sure that all `requirements <https://github.com/sociomantic-tsunami/git-hub#dependencies>`_ for your preferred version are installed , and run either ``make deb`` followed by ``sudo dpkg -i deb/git-hub*all.deb`` or ``make install``.
Make sure that all `requirements <https://github.com/sociomantic-tsunami/git-hub#dependencies>`_ for your preferred version are installed , and run either ``make deb`` followed by ``sudo dpkg -i deb/git-hub*all.deb``, or ``make install``.
.. [#f6] Note that this is a :command:`git push`, not :command:`datalad push`.
Tags could be pushed upon a :command:`datalad push`, though, if one
Expand Down
Loading

0 comments on commit de87a79

Please sign in to comment.