Skip to content

Commit

Permalink
wip
Browse files Browse the repository at this point in the history
  • Loading branch information
pgierz committed Oct 11, 2024
1 parent 62e1131 commit 7cadae7
Show file tree
Hide file tree
Showing 5 changed files with 93 additions and 92 deletions.
40 changes: 40 additions & 0 deletions doc/including_custom_steps.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
========================================
Develop: Including Custom Pipeline Steps
========================================

To include custom pipeline steps in your pipeline, you can add them to the
pipeline's ``steps`` attribute. For example, to include a custom step that
is defined in ``my_module.py`` and is named ``my_custom_step``, you can
declare it like this:

.. code-block:: yaml
pipelines:
- name: custom_pipeline
steps:
- custom_package.my_module.my_custom_step
In the file ``my_module.py``, which is somewhere in ``custom_package``,
you can define the custom step like this:

.. code-block:: python
def my_custom_step(data, rule):
# Do something with the data
return data
This works best if you have a full-fledged Python package, with a proper
``setup.py`` file, that you can install in your environment. If you don't
have a package, you can also define the custom step in a separate Python
file and import it in your pipeline configuration file:

.. code-block:: yaml
pipelines:
- name: custom_pipeline
steps:
- script:///albedo/home/pgierz/Code/playground/my_custom_step.py::my_custom_step
Note that the ``script://`` prefix is required! Thereafter, you should still start your
path with a slash, e.g. use an absolute path all the way. The function inside your file
should be defined like this with a colon ``:`` followed by the function name.
3 changes: 1 addition & 2 deletions doc/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,11 +16,10 @@ Contents
pymorize_building_blocks
pymorize_config_file
pymorize_cli
including_subcommand_plugins
schemas
developer_guide
including_custom_steps
including_subcommand_plugins
developer_guide
API


Expand Down
49 changes: 49 additions & 0 deletions doc/pymorize_cli.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
===========================
Usage: The ``pymorize`` CLI
===========================

``pymorize`` is the command line interface to the ``pymorize`` package. It provides
a simple way to interface with the underlying Python, without needing to know too
many details about what is going on behind the scenes. The CLI is hopefully simple
and is the recommended way to get going.

You can get help with::

pymorize -h

The CLI is divided into a few subcommands. The main one you will want to use is::

pymorize process <configuration_yaml>

This will process the configuration file and run the CMORization process. Read on for
a full summary of the commands.

* ``pymorize develop``: Tools for developers

- Subcommand ``ls``: Lists a directory and stores the output as a ``yaml``. Possibly
useful for development work and creating in-memory representations of certain folders.

* ``pymorize externals``: List external program status

You might want to use ``NCO`` or ``CDO`` in your workflows. The ``pymorize externals`` command
lists information about the currently found versions for these two programs.

* ``pymorize plugins``: Extending the command line interface

The user can extend the pymorize CLI by adding their own plugins to the main command. This
lists the docstrings of those plugins.

.. note:: Paul will probably throw this out when we clean up the project for release.

* ``pymorize process``: The main command. Takes a yaml file and runs through the CMORization process.

* ``pymorize ssh-tunnel``: Creates port forwarding for Dask and Prefect dashboards. You should provide
your username and the remote **compute** node, **not the login node**. The tunnels will default to ``8787`` for
Dask and ``4200`` for Prefect.

.. important:: You need to run this from your laptop!

* ``pymorize table-explorer``: Opens up the web-based table explorer. This is a simple way to explore the
tables that are available in the CMIP6 data request.

* ``pymorize validate``: Runs checks on a configuration file.
90 changes: 1 addition & 89 deletions doc/pymorize_config_file.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,92 +2,4 @@
Usage: The ``pymorize`` Configuration File
==========================================

The configuration file used for ``pymorize`` is a simple YAML file. A breakdown of each section is provided below.

+----------------+----------+--------------+-------------------------------------------------------------+
| Parameter | Required | Type | Description |
+================+==========+==============+=============================================================+
| cmor_table_dir | REQUIRED | Path | The directory where the CMOR tables are stored. This is |
| | | | used to find the CMOR tables when reading in data. |
+----------------+----------+--------------+-------------------------------------------------------------+
| output_dir | REQUIRED | Path | The main directory where model output is stored. |
+----------------+----------+--------------+-------------------------------------------------------------+
| rules | REQUIRED | List of | A list of rules that define how to process the data. |
| | | Dictionaries | Each rule is a dictionary with the following keys: |
| | | | |
| | | | - model_variable: The name of the variable as it is in the |
| | | | model output. |
| | | | - cmor_variable: The name of the variable as it is in the |
| | | | CMOR tables. |
| | | | - cmor_table: The name of the CMOR table to use. |
| | | | - input_patterns: list of patterns to apply this rule to |
| | | | - output_pattern: list of files to create. See note about |
| | | | placeholder replacements |
| | | | - actions: list of actions. See below for more information. |
| | | | |
+----------------+----------+--------------+-------------------------------------------------------------+

Example
-------
To better illustrate, here is a full example with a single rule:

.. code-block:: yaml
cmor_table_dir: /path/to/table/dir
output_dir: /path/to/output/dir
rules:
- model_variable: salt
model_units: PSU
cmor_variable: so
cmor_table: CMIP6_Omon.json
input_patterns:
- /path/to/fesom/output/files/*_salt.nc
output_pattern: salt.nc
actions:
- invert_z_axis: True
- linear_transform:
slope: 1.0
intercept: 0.0
Input and Output Pattern Placeholders
-------------------------------------

.. note:: The key names are ``input_patterns`` (PLURAL) and ``output_pattern`` (SINGULAR)

The input and output patterns can contain placeholders that are replaced with the appropriate values. The following placeholders are available:

* ``{model_variable}``: The name of the variable as it is in the model output.
* ``{cmor_variable}``: The name of the variable as it is in the CMOR tables.
* ``{cmor_table}``: The name of the CMOR table to use.
* ``{date}``: The date of the model output.

Dates can further be formatted using the Python ``strftime`` format. For example, to format the date as ``YYYYMMDD``, use ``{date:%Y%m%d}``. More information on the ``strftime`` format can be found in the Python documentation. https://docs.python.org/3/library/datetime.html#format-codes

Actions
-------

The actions are a list of dictionaries that define how to process the data. The main dictionary key is a fully qualifed Python callable. You then can assign arguments as a list, and key-word arguments as a dictionary.

For example, the following action:

.. code-block:: yaml
actions:
- invert_z_axis:
args:
- True
kwargs: {}
- linear_transform:
args: []
kwargs:
slope: 1.0
intercept: 0.0
Would call the following Python code:

.. code-block:: python
data = invert_z_axis(data, True)
data = linear_transform(data, slope=1.0, intercept=0.0)
The actions are applied in the order they are listed in the configuration file. The data argument is the variable in the file that matches the `model_variable` key in the rule for each file described by the `input_patterns` key.
...
3 changes: 2 additions & 1 deletion doc/schemas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
User Configuration Schemas
==========================

This page documents the configuration schemas used for validation in the project.
This page documents the configuration schemas used for validation of the
yaml configuration file.


.. cerberus-schema:: Pipeline Schema
Expand Down

0 comments on commit 7cadae7

Please sign in to comment.