Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Preview Dataset #1757

Merged
merged 26 commits into from
Feb 26, 2024
Merged

Conversation

rashidakanchwala
Copy link
Contributor

@rashidakanchwala rashidakanchwala commented Feb 21, 2024

Description

Documentation for the new changes in 'Preview Datasets'

Development notes

QA notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added new entries to the RELEASE.md file
  • Added tests to cover my changes


This page describes how to preview data from different datasets in a Kedro project with Kedro-Viz. Dataset preview was introduced in Kedro-Viz version 6.3.0, which offers preview for `CSVDatasets` and `ExcelDatasets`.
To provide users with a glimpse of their datasets within a Kedro project, Kedro-Viz offers a preview feature. This feature was introduced in Kedro-Viz version 6.3.0 and expanded upon in version 8.0.0. Initially, it supported `CSVDatasets` and `ExcelDatasets`, and later extended to encompass additional dataset types such as `PlotlyDatasets` and image datasets like `MatplotlibWriter`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Nit] ExcelDatasets, and was later extended to ..

```{important}
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window.
```
Whilst we currently support the above datasets. We are soon going to extend this functionality to other datasets. Users with custom datasets can also extend the preview functionality and we will cover that in the following sections.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we currently support the aforementioned datasets, we are soon going to extend this functionality to include other datasets. Users with custom datasets can also expand the preview functionality, and we will cover that in the following sections.


**Extend Preview to Custom Datasets**

The page titled [Extend Preview to Custom Datasets](./preview_custom_datasets.md) contains information on how you can set up preview for custom datasets and what types are supported by Kedro-viz.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page titled Extend Preview to Custom Datasets contains information on how to set up previews for custom datasets and which types are supported by Kedro-Viz



To enable dataset preview, add the `preview_args` attribute to the kedro-viz configuration under the `metadata` section in the Data Catalog. Within preview_args, specify `nrows` as the number of rows to preview for the dataset.
To disable dataset previews for specific datasets, you need to set preview: false under the kedro-viz key within the metadata section of your conf.yml file. Here's an example configuration:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we mention that previewing is made default from latest version of viz ? If someone is using old version of viz, it will be opt in ? Also, it will be a good idea to mention about the kedro-datasets version this new feature of kedro viz supports somewhere in the doc. Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.

```



## Previewing Data on Kedro-viz
Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI - this entire section has now moved to Preview Tabular Data on Kedro-viz

@@ -0,0 +1,74 @@
# Preview Tabular Data in Kedro-viz
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This content in this section is not new and it is just moved to a new page. Earlier it was a part of the 'Preview Datasets' page

@rashidakanchwala rashidakanchwala self-assigned this Feb 22, 2024
@rashidakanchwala rashidakanchwala marked this pull request as ready for review February 22, 2024 10:46
@rashidakanchwala rashidakanchwala requested review from NeroOkwa, astrojuanlu, noklam and merelcht and removed request for tynandebold and yetudada February 22, 2024 10:46
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments, but I'll have another look next week 🙂

docs/source/experiment_tracking.md Outdated Show resolved Hide resolved
docs/source/preview_custom_datasets.md Outdated Show resolved Hide resolved
docs/source/preview_custom_datasets.md Outdated Show resolved Hide resolved
from kedro_datasets._typing import TablePreview

class CustomDataset:
def preview(self, nrows: int = 5) -> TablePreview:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we maybe add an example that works, to give users a bit more guidance on how they should realistically implement a preview() method?

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will need some help on this as I can't think of a realistic CustomDataset that is not a part of kedro-datasets. @astrojuanlu -- do you have some ideas?

docs/source/preview_datasets.md Outdated Show resolved Hide resolved
docs/source/preview_datasets.md Outdated Show resolved Hide resolved
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few minor comments, I think the most important one is an example of CustomDataset. Other than that there are quite a few inconsistency uses of Dataset Datasets and DataSet.

Comment on lines 48 to 53
preview_args:
nrows: 15
```

If no preview_args are specified, the default preview will show the first 5 rows.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was asked in the Slack once, I think we should make it obvious that the preview_args is the argument that get pass into the preview function directly, and user can have arbitary arguments.

def preview(self, arg1, arg2):
  ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think for pandas datasets .. it is specific to nrows as we wrote the preview() func.

But I have updated the custom dataset docs to include arguments. Thanks for highlighting this @noklam

docs/source/preview_custom_datasets.md Outdated Show resolved Hide resolved

When creating a custom dataset, if you wish to enable data preview for that dataset, you must implement a `preview()` function within the custom dataset class. Kedro-Viz currently supports previewing tables, Plotly charts, images, and JSON objects.

The return type of the `preview()` function should match one of the following types, as defined in the `kedro-datasets` source code (_typing.py file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to add a link to such file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the github link fine ? - https://github.com/kedro-org/kedro-plugins/blob/main/kedro-datasets/kedro_datasets/_typing.py

I can't seem to find docs source code link for _typing.py

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is not documented, so a link to the source code is fine for now.

docs/source/preview_datasets.md Outdated Show resolved Hide resolved

In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):
While we currently support the aforementioned datasets, we are soon going to extend this functionality to include other datasets. Users with custom datasets can also expand the preview functionality, , which is covered in the section [Extend Preview to Custom Datasets](./preview_custom_datasets.md).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have Vale enabled on this repo? I get the sense that aforementioned would be flagged as too wordy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think so.

docs/source/preview_datasets.md Outdated Show resolved Hide resolved
We recommend that you use the same version of Kedro that was most recently used to test this tutorial (0.19.0). To check the version installed, type `kedro -V` in your terminal window.
```

In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably capitalise Spaceflights?

Suggested change
In your terminal window, navigate to the folder you want to store the project. Generate the spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):
In your terminal window, navigate to the folder you want to store the project. Generate the Spaceflights tutorial project with all the code in place by using the [Kedro starter for the spaceflights tutorial](https://github.com/kedro-org/kedro-starters/tree/main/spaceflights-pandas):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed it's mostly lower-case elsewhere in the docs so I will leave it as is.

docs/source/preview_pandas_datasets.md Show resolved Hide resolved
docs/source/preview_pandas_datasets.md Outdated Show resolved Hide resolved
docs/source/preview_pandas_datasets.md Outdated Show resolved Hide resolved
docs/source/preview_pandas_datasets.md Outdated Show resolved Hide resolved
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments and a suggestion on adding an example preview() method. It's all non blocking though, so I'll approve and let you decide on whether to add it or not 🙂

@@ -0,0 +1,59 @@
# Extend preview to Custom Datasets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Extend preview to Custom Datasets
# Extend preview to custom datasets

```


## Examples of Previews
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there way to show the JSONPreview as well?

Copy link
Contributor Author

@rashidakanchwala rashidakanchwala Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, the JSON preview is actually experiment tracking oriented hence I was hesitant to share as it might create some confusion. In the next couple of sprints, we will enable preview for a JSONDataset and then I could add that example then.


class CustomDataset:
def preview(self, nrows, ncolumns, filters) -> TablePreview:
# Add logic for generating preview
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I suggested adding a working example, I didn't necessarily mean anything complex, just in this case some code that would produce a working TablePreview and demonstrates how nrows, ncolumns and filters would be used.

docs/source/preview_pandas_datasets.md Outdated Show resolved Hide resolved
docs/source/preview_pandas_datasets.md Outdated Show resolved Hide resolved
Copy link
Contributor

@NeroOkwa NeroOkwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@ravi-kumar-pilla ravi-kumar-pilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work with the docs 💯 ...LGTM

@rashidakanchwala rashidakanchwala merged commit 93fe1c8 into main Feb 26, 2024
5 checks passed
@rashidakanchwala rashidakanchwala deleted the docs/preview-datasets branch February 26, 2024 17:55
@ravi-kumar-pilla ravi-kumar-pilla mentioned this pull request Mar 1, 2024
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants