Skip to content

AlpAribal/kedro-inspect

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

kedro-inspect

Overview

The single objective of kedro-inspect is to decouple the representation of a Kedro pipeline from its implementation and execution. This is useful for inspecting the pipeline without having access to the Kedro project or setting up dependencies that are only needed when running the pipeline.

Once we isolate the pipeline representation, we can use it for various purposes, such as analysing its structure, document it, or share it with others.

This representation can be saved to a static file (e.g. JSON). Then, the saved pipeline can be visualized using the Kedro-Viz package, or any other tool (written in any programming language) that can read the pipeline file format.

Inspection

The plan is to inspect the pipeline better, i.e. add more information to the pipeline representation over time, such as fine-grained type information or package dependencies per node.

This added information can be useful for various purposes, such as:

  • Generating documentation & schemas for the pipeline
  • Visualisation
  • Optimising pipeline execution
  • Generating a pipeline test suite

Compare to current Kedro functionality

Kedro provides serialisation of the pipeline. The crucial difference is that kedro-inspect does not require the Kedro project, hence can be used without setting up the project or its dependencies.

Usage

usage: kedro-inspect [-h] [-p PIPELINE] [-o OUTPUT] [--indent INDENT] project_path

Inspect a Kedro pipeline.

positional arguments:
  project_path          path to the Kedro project

optional arguments:
  -h, --help            show this help message and exit
  -p PIPELINE, --pipeline PIPELINE
                        name of the pipeline to inspect (default: __default__)
  -o OUTPUT, --output OUTPUT
                        path to the output file (default: None)
  --indent INDENT       indentation for JSON output (default: None)

Running kedro-inspect on spaceflights-pandas, we get a list of representations of the nodes in the pipeline. For example, the first node is represented as follows:

"nodes": [
        {
            "name": "preprocess_companies_node",
            "tags": [],
            "confirms": [],
            "namespace": null,
            "inputs": "companies",
            "outputs": "preprocessed_companies",
            "function": {
                "func": "spaceflights_pandas.pipelines.data_processing.nodes.preprocess_companies",
                "parameters": [
                    {
                        "name": "companies",
                        "kind": "POSITIONAL_OR_KEYWORD",
                        "type_hint": "pandas.core.frame.DataFrame"
                    }
                ],
                "return_value": "pandas.core.frame.DataFrame"
            },
            "param_to_input": {
                "companies": [
                    "companies"
                ]
            }
        },
        ...
]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages