Skip to content

Commit

Permalink
Merge pull request #352 from quantumblacklabs/release/0.15.5
Browse files Browse the repository at this point in the history
Release 0.15.5
  • Loading branch information
andrii-ivaniuk authored Dec 12, 2019
2 parents b97440a + cdd983c commit 98a6c8f
Show file tree
Hide file tree
Showing 126 changed files with 6,282 additions and 1,049 deletions.
4 changes: 2 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ repos:
language: system
types: [file, python]
files: ^tests/
entry: pylint --disable=missing-docstring,redefined-outer-name,no-self-use,invalid-name,protected-access
entry: pylint --disable=missing-docstring,redefined-outer-name,no-self-use,invalid-name,protected-access,too-many-arguments
stages: [commit]

# The same pylint checks, but running on all files. It's for manual run with `make lint`
Expand All @@ -100,7 +100,7 @@ repos:
language: system
pass_filenames: false
stages: [manual]
entry: pylint --disable=missing-docstring,redefined-outer-name,no-self-use,invalid-name,protected-access tests
entry: pylint --disable=missing-docstring,redefined-outer-name,no-self-use,invalid-name,protected-access,too-many-arguments tests
# We need to make some exceptions for 3.5, that's why it's a custom runner.
- id: black
name: "Black"
Expand Down
2 changes: 2 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,8 @@ make lint

#### Unit tests, 100% coverage (`pytest`, `pytest-cov`)

Note that you will need the dependencies installed in `test_requirements.txt`, which includes `memory-profiler`. If you are on a Unix-like system, you may need to install the necessary build tools. See the `README.md` [file](/kedro/contrib/decorators/README.md) in `kedro.contrib.decorators` for more information.

```bash
make test
```
Expand Down
123 changes: 44 additions & 79 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,27 @@
![Kedro Logo Banner](https://raw.githubusercontent.com/quantumblacklabs/kedro/master/img/kedro_banner.jpg)

`develop` | `master`
----------|---------
[![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop) | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/master.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/master)
[![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/develop?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/develop) | [![Build status](https://ci.appveyor.com/api/projects/status/2u74p5g8fdc45wwh/branch/master?svg=true)](https://ci.appveyor.com/project/QuantumBlack/kedro/branch/master)
-----------------

[![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)](https://pypi.org/project/kedro/)
[![PyPI version](https://badge.fury.io/py/kedro.svg)](https://pypi.org/project/kedro/)
[![Documentation](https://readthedocs.org/projects/kedro/badge/?version=latest)](https://kedro.readthedocs.io/)
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black)
[![Downloads](https://pepy.tech/badge/kedro)](https://pepy.tech/project/kedro)
| Theme | Status |
|------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Latest Release | [![PyPI version](https://badge.fury.io/py/kedro.svg)](https://pypi.org/project/kedro/) |
| Python Version | [![Python Version](https://img.shields.io/badge/python-3.5%20%7C%203.6%20%7C%203.7-blue.svg)](https://pypi.org/project/kedro/) |
| `master` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/master.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/master) |
| `develop` Branch Build | [![CircleCI](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop.svg?style=shield)](https://circleci.com/gh/quantumblacklabs/kedro/tree/develop) |
| Documentation Build | [![Documentation](https://readthedocs.org/projects/kedro/badge/?version=latest)](https://kedro.readthedocs.io/) |
| License | [![License](https://img.shields.io/badge/license-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) |
| Code Style | [![Code Style: Black](https://img.shields.io/badge/code%20style-black-black.svg)](https://github.com/ambv/black) |

# What is Kedro?

> "The centre of your data pipeline."
## What is Kedro?

Kedro is a workflow development tool that helps you build data pipelines that are robust, scalable, deployable, reproducible and versioned. We provide a standard approach so that you can:
- spend more time building your data pipeline,
- worry less about how to write production-ready code,
- standardise the way that your team collaborates across your project,
- work more efficiently.
> "The centre of your data pipeline."
Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) to solve challenges they faced in their project work.
Kedro is a development workflow framework that implements software engineering best-practice for data pipelines with an eye towards productionising machine learning models. We provide a standard approach so that you can:
- Worry less about how to write production-ready code,
- Spend more time building data pipelines that are robust, scalable, deployable, reproducible and versioned,
- And, standardise the way that your team collaborates across your project.

This work was later turned into a product thanks to the following contributors:
[Ivan Danov](https://github.com/idanov), [Dmitrii Deriabin](https://github.com/DmitryDeryabin), [Gordon Wrigley](https://github.com/tolomea), [Yetunde Dada](https://github.com/yetudada), [Nasef Khan](https://github.com/nakhan98), [Kiyohito Kunii](https://github.com/921kiyo), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae), [Peteris Erins](https://github.com/Pet3ris), [Lorena Balan](https://github.com/lorenabalan), [Richard Westenra](https://github.com/richardwestenra) and [Anton Kirilenko](https://github.com/Flid).

## How do I install Kedro?

Expand All @@ -35,94 +31,63 @@ This work was later turned into a product thanks to the following contributors:
pip install kedro
```

For more detailed installation instructions, including how to setup Python virtual environments, please visit our [installation guide](https://kedro.readthedocs.io/en/latest/02_getting_started/02_install.html).

## What are the main features of Kedro?

### 1. Project template and coding standards
See more detailed installation instructions, including how to setup Python virtual environments, in our [installation guide](https://kedro.readthedocs.io/en/latest/02_getting_started/02_install.html) and get started with our ["Hello Word"](https://kedro.readthedocs.io/en/latest/02_getting_started/04_hello_world.html) example.

- A standard and easy-to-use project template
- Configuration for credentials, logging, data loading and Jupyter Notebooks / Lab
- Test-driven development using `pytest`
- [Sphinx](http://www.sphinx-doc.org/en/master/) integration to produce well-documented code
## Why does Kedro exist?

### 2. Data abstraction and versioning
Kedro is built upon our collective best-practice (and mistakes) trying to deliver real-world ML applications that have vast amounts of dirty data. We developed Kedro to achieve the following:

- Separation of the _compute_ layer from the _data handling_ layer, including support for different data formats and storage options
- Versioning for your data sets and machine learning models
- **Collaboration** on an analytics codebase when different team members have varied exposure to software engineering best-practice
- Focussing on **maintainable data and ML pipelines** as the standard, instead of a singular activity of deploying models in production
- A way to inspire the creation of **reusable analytics code** so that we never start from scratch when working on a new project
- **Efficient use of time** because we're able to quickly move from experimentation into production

### 3. Modularity and pipeline abstraction

- Support for pure Python functions, `nodes`, to break large chunks of code into small independent sections
- Automatic resolution of dependencies between `nodes`
- Visualise your data pipeline with [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz), a tool that shows the pipeline structure of Kedro projects
Kedro was originally designed by [Aris Valtazanos](https://github.com/arisvqb) and [Nikolaos Tsaousis](https://github.com/tsanikgr) to solve challenges they faced in their project work. This work was later turned into a product thanks to the following contributors:
[Ivan Danov](https://github.com/idanov), [Dmitrii Deriabin](https://github.com/DmitryDeryabin), [Gordon Wrigley](https://github.com/tolomea), [Yetunde Dada](https://github.com/yetudada), [Nasef Khan](https://github.com/nakhan98), [Kiyohito Kunii](https://github.com/921kiyo), [Nikolaos Kaltsas](https://github.com/nikos-kal), [Meisam Emamjome](https://github.com/misamae), [Peteris Erins](https://github.com/Pet3ris), [Lorena Balan](https://github.com/lorenabalan), [Richard Westenra](https://github.com/richardwestenra) and [Anton Kirilenko](https://github.com/Flid).

*Note:* Read our [FAQs](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#how-does-kedro-compare-to-other-projects) to learn how we differ from workflow managers like Airflow and Luigi.
## What are the main features of Kedro?

![Kedro-Viz Pipeline Visualisation](https://raw.githubusercontent.com/quantumblacklabs/kedro/master/img/pipeline_visualisation.png)
*A pipeline visualisation generated using [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz)*

### 4. Feature extensibility

- A plugin system that injects commands into the Kedro command line interface (CLI)
- List of officially supported plugins:
- [Kedro-Airflow](https://github.com/quantumblacklabs/kedro-airflow), making it easy to prototype your data pipeline in Kedro before deploying to [Airflow](https://github.com/apache/airflow), a workflow scheduler
- [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), a tool for packaging and shipping Kedro projects within containers
- Kedro can be deployed locally, on-premise and cloud (AWS, Azure and GCP) servers, or clusters (EMR, Azure HDinsight, GCP and Databricks)
| Feature | What is this? |
|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Project Template | A standard, modifiable and easy-to-use project template based on [Cookiecutter Data Science](https://github.com/drivendata/cookiecutter-data-science/). |
| Data Catalog | A series of lightweight data connectors used for saving and loading data across many different file formats and file systems including local storage, AWS, Azure Blob and Google Cloud Storage. The Data Catalog also includes data and model versioning for file-based systems. Used with a Python or YAML API. |
| Pipeline Abstraction | Automatic resolution of dependencies between pure Python functions and data pipeline visualisation using [Kedro-Viz](https://github.com/quantumblacklabs/kedro-viz). |
| The Journal | An ability to reproduce pipeline runs with saved pipeline run results. |
| Coding Standards | Test-driven development using `pytest`, produce well-documented code using [Sphinx](http://www.sphinx-doc.org/en/master/) and make use of the standard Python logging library. |
| Flexible Deployment | Deployment strategies that include the use of Docker with [Kedro-Docker](https://github.com/quantumblacklabs/kedro-docker), conversion of Kedro pipelines into Airflow DAGs with [Kedro-Airflow](https://github.com/quantumblacklabs/kedro-airflow), leveraging a REST API endpoint with Kedro-Server _(coming soon)_ and serving Kedro pipelines as a Python package. Kedro can be deployed locally, on-premise and cloud (AWS, Azure and Google Cloud Platform) servers, or clusters (EMR, EC2, Azure HDinsight and Databricks). |

## What are the main Kedro building blocks?

You can find the overview of Kedro architecture [here](https://kedro.readthedocs.io/en/latest/06_resources/02_architecture_overview.html).

## How do I use Kedro?

Our [documentation](https://kedro.readthedocs.io/en/latest/) explains:

- A typical Kedro workflow
- How to set up the project configuration
- Building your first pipeline
- How to use the CLI offered by `kedro_cli.py` (`kedro new`, `kedro run`, ...)
- Best-practice on how to [get started using Kedro](https://kedro.readthedocs.io/en/latest/02_getting_started/01_prerequisites.html)
- A ["Hello World" data and ML pipeline example](https://kedro.readthedocs.io/en/latest/02_getting_started/04_hello_world.html) based on the **Iris dataset**
- A two-hour [Spaceflights tutorial](https://kedro.readthedocs.io/en/latest/03_tutorial/01_workflow.html) that teaches you beginner to intermediate functionality
- How to [use the CLI](https://kedro.readthedocs.io/en/latest/06_resources/03_commands_reference.html) offered by `kedro_cli.py` (`kedro new`, `kedro run`, ...)
- An overview of [Kedro architecture](https://kedro.readthedocs.io/en/latest/06_resources/02_architecture_overview.html)
- [Frequently asked questions (FAQs)](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html)

> *Note:* The CLI is a convenient tool for being able to run `kedro` commands but you can also invoke the Kedro CLI as a Python module with `python -m kedro`
Documentation for the latest stable release can be found [here](https://kedro.readthedocs.io/en/latest/). You can also run `kedro docs` from your CLI and open the documentation for your current version of Kedro in a browser.

## How do I find Kedro documentation?

This CLI command will open the documentation for your current version of Kedro in a browser:

```
kedro docs
```

Documentation for the latest stable release can be found [here](https://kedro.readthedocs.io/en/latest/). Check these out first:
> *Note:* The CLI is a convenient tool for being able to run `kedro` commands but you can also invoke the Kedro CLI as a Python module with `python -m kedro`
- [Getting started](https://kedro.readthedocs.io/en/latest/02_getting_started/01_prerequisites.html)
- [Tutorial](https://kedro.readthedocs.io/en/latest/03_tutorial/01_workflow.html)
- [FAQ](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html)
*Note:* Read our [FAQs](https://kedro.readthedocs.io/en/latest/06_resources/01_faq.html#how-does-kedro-compare-to-other-projects) to learn how we differ from workflow managers like Airflow and Luigi.

## Can I contribute?

Yes! Want to help build Kedro? Check out our guide to [contributing](https://github.com/quantumblacklabs/kedro/blob/master/CONTRIBUTING.md).

## How do I upgrade Kedro?

We use [Semantic Versioning](http://semver.org/). The best way to safely upgrade is to check our [release notes](https://github.com/quantumblacklabs/kedro/blob/master/RELEASE.md) for any notable breaking changes.

Once Kedro is installed, you can check your version as follows:

```
kedro --version
```

To later upgrade Kedro to a different version, simply run:

```
pip install kedro -U
```

## What licence do you use?

Kedro is licensed under the [Apache 2.0](https://github.com/quantumblacklabs/kedro/blob/master/LICENSE.md) License.


## We're hiring!

Do you want to be part of the team that builds Kedro and [other great products](https://quantumblack.com/labs) at QuantumBlack? If so, you're in luck! QuantumBlack is currently hiring Software Engineers who love using data to drive their decisions. Take a look at [our open positions](https://www.quantumblack.com/careers/current-openings#content) and see if you're a fit.
Loading

0 comments on commit 98a6c8f

Please sign in to comment.