Extend factories syntax to other config files? #3086

astrojuanlu · 2023-09-27T09:31:04Z

Description

Hey all,
This can be done in catalog.yml to access data through namespace, how can I do this in parameters.yml???
"{namespace}.data_fro":
   type: xyz
   filepath: "path/to/data/{namespace}/month.csv"

https://linen-slack.kedro.org/t/15867022/hey-all-this-can-be-done-in-catalog-yml-to-access-data-throu#52b21693-0b9f-43d0-ae46-b4f215720bbe

One last thing… I’ve seen the following syntax for so-called “dataset factory”
"{namespace}.data":
    type: ….
    filepath: …
Is there anything equivalent for parameters ? 🙂
(I’ve tried… and of course.. failed 😅 )

https://linen-slack.kedro.org/t/13222958/hi-everyone-is-there-a-way-to-have-truly-global-params-i-e-p#c273b11c-f60a-4ca1-bc87-9aab4301b327

Context

Opening this issue to collect feedback and use cases.

Maybe the answer is "no", and maybe the answer is "these issues arise from something else" (for example @noklam has suggested that namespaces are confusing). But I think it's important that we centralize the discussion.

Possible Implementation

Possible Alternatives

The text was updated successfully, but these errors were encountered:

ankatiyar · 2023-11-30T08:51:56Z

Posting some more questions about using OmegaConfigLoader features with dataset factories that have come up recently -

Is it possible to use OmegaConf nested interpolation in the catalog?
I would like to do something like:

"{table}_ds":
  type: my_custom_ds
  site: "${globals:site}"
  file_id: "${globals:file_id[${table}]}"

and in globals.yml

file_id:
   users: ""
   items: ""
   site: ""

The resolution of the config happens when the config is loaded before a session is run. The dataset factory placeholders are resolved later when the pipeline is being executed. So it isn't currently possible to do this.

felipemonroy · 2024-02-02T20:55:59Z

Hi,

Is it possible to partially resolve the config (only the keys without placeholders) before the session run? Then, before the pipeline run we could resolve the placeholders and perfom again another OmegaConf resolution.

This is something similar to the catalog resolution, the only difference is that we should resolve with OmegaConf after the placeholder resolution.

My use case would be:

{country}.{city}.training:
    metadata:
        country: "{country}"
        city: "{city}"
    params:
        alpha: 1
        beta: 0
        
# This will be used when namespace is us.new_york, not the factory one
us.new_york.training:
    metadata:
        country: us
        city: new_york
    params:
        alpha: 0.5
        beta: 0.2

inigohidalgo · 2024-02-07T11:19:05Z

Is it possible to use OmegaConf nested interpolation in the catalog?
I would like to do something like:
"{table}_ds":
  type: my_custom_ds
  site: "${globals:site}"
  file_id: "${globals:file_id[${table}]}"
and in globals.yml
file_id:
   users: ""
   items: ""
   site: ""
The resolution of the config happens when the config is loaded before a session is run. The dataset factory placeholders are resolved later when the pipeline is being executed. So it isn't currently possible to do this.

I have the same need to be able to define dataset factories but customize specific parameters. I would like to be able to do something like this (the example uses simple variable interpolation but ideally I would want to be using globals):

_dataset_config:
  country_technology_granularity:
    partition_method: datetime
    datetime_column: gas_date
    partition_by: [year, month, day]

'{signal_name}__predictions':
  type: axpo.kedro.datasets.pandas_arrow_dataset.ParquetArrowDataset
  path: abfs://container/{signal_name}/predictions/
  credentials: blob_storage
  versioned: true
  write_mode: append
  partition_method: {_dataset_config.{signal_name}.partition_method}

kasperjanehag · 2024-02-14T10:02:15Z

I would love to extend the factories pattern to other config files. When creating dynamic pipelines (reusing part of a pipeline many times as seen in https://getindata.com/blog/kedro-dynamic-pipelines/), you often end up with a lot of datasets having the same names but namespaced. Using the catalog factories pattern, the output of these pipelines could all be persisted with a single catalog entry like:

"{namespace}.{variant}.classification_experiment":
  type: "${_datasets.pickle}"
  backend: cloudpickle
  filepath: "${_base_path}/${_folders.mdl}/{namespace}/{variant}/classification_experiment_train.pkl"
  metadata:
    layer: Model

"{namespace}.{variant}.classification_base_model":
  type: "${_datasets.pickle}"
  filepath: "${_base_path}/${_folders.mdl}/{namespace}/{variant}/classification_base_model.pkl"
  metadata:
    layer: Model

However, since there's no similar functionality for parameters, you then have to create a parameterset for every single variant instead of reusing part of the parameters. For my use-cases, I would also need to be able to define parameter factories but customize specific parameters (similar to what @inigohidalgo mentioned but for parameters). Think of it like:

"{namespace}.{variant}.modelling:
  param_1: value_1
  param_2: value_2
  param_3: value_3
  
 use_case_1.variant_1.modelling:
   param_1: overridden_value_1

noklam · 2024-02-14T11:19:51Z

Thank you for the example, it's clear and easy to understand.

Can you use a subkey instead of a string literal? I don't think you are supposed to use a.b.c as a key

You can use variable interpolation instead @kasperjanehag

A:
${parameter_groups}

I find this more clear than compiling a pattern and is similar to what YAML anchor does.

kasperjanehag · 2024-02-14T15:52:23Z

@noklam, sorry good point. I guess you mean:

"{namespace}"
    "{variant}"
           modelling:
              param_1: value_1
              param_2: value_2
              param_3: value_3
 
use_case_1
    variant_1
           modelling:
              param_1: overridden_value_1

noklam · 2024-02-14T16:13:21Z

@kasperjanehag I mean more like leveraging what OmegaConf support already, we may not need to introduce a new syntax.

Assuming you don't need to override parameters (i am not sure if there is a way yet but the point is we can consider a different approach)

default:
           modelling:
              param_1: value_1
              param_2: value_2
              param_3: value_3
 
use_case_1:
    variant_1: `${default}

use_case_2:
    variant_2: `${default}

use_case_3:
    variant_3: `${default}

I find this more readable, which is also similar to YAML anchor.

kasperjanehag · 2024-02-16T09:40:09Z

@noklam thanks for suggesting. The main problem I see with this approach (correct me if I'm wrong), is how polluted the parameter space gets. With your suggested approach (which we're also currently running in a few project), wouldn't parameters be duplicated and exist both in use_case_1.variant_1 section as well as the default section?

noklam · 2024-02-16T10:29:50Z

This is a fair point. Though this is just a design choice because factory pattern are duplicate d in the config, we remove it from the resolved version because it's fairly easy to identify.

We can take the same approach, let say if it's start from _ we know that it's not a real config.

julnow · 2024-03-06T09:15:46Z

I'd be interested in this as well! Currently I'm creating a data processing pipeline for some devices measurements data, which we identify by the SAP number of the device.

In settings.py each number is added:

DYNAMIC_PIPELINES_MAPPING = {
    "sap_number": [
        "1234",
        "2345",
        "3456",
    ],
}

which is later used in i.a. catalog and pipelines, so each SAP number goes through a separate copy of dynamically created pipeline (as per getindata tutorial). However, when I pass these numbers to a node which connects with sql database and filters record by these SAP numbers I have to manually define:

lookup_params:
  sap_number: "000000"
  mode: "someMeasurementMode"
  someBoolean: True

sap_number:
  1234:
    _overrides:
      sap_number: "1234"
    lookup_params: ${merge:${lookup_params},${._overrides}}
  2345:
    _overrides:
      sap_number: "2345"
    lookup_params: ${merge:${lookup_params},${._overrides}}
  3456:
    _overrides:
      sap_number: "3456"
    lookup_params: ${merge:${lookup_params},${._overrides}}

Could it be somehow simplified so that my params are defined as:

lookup_params:
  sap_number: ${variant}
  mode: "someMeasurementMode"
  someBoolean: True

(similarly to catalog)

kasperjanehag · 2024-03-06T12:44:19Z

Yes, @julnow that's how we do it in our project as well. Would love a simpler solution to parameter factories,but haven't figured out the right implementation yet.

astrojuanlu added the Issue: Feature Request New feature or improvement to existing feature label Sep 27, 2023

github-actions bot mentioned this issue Oct 1, 2023

Monthly issue metrics report #3100

Closed

ankatiyar added this to the Dataset Factory Improvements milestone Mar 4, 2024

kedro-org locked and limited conversation to collaborators Mar 28, 2024

merelcht converted this issue into discussion #3751 Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Extend factories syntax to other config files? #3086

Extend factories syntax to other config files? #3086

astrojuanlu commented Sep 27, 2023

ankatiyar commented Nov 30, 2023

felipemonroy commented Feb 2, 2024 •

edited

Loading

inigohidalgo commented Feb 7, 2024

kasperjanehag commented Feb 14, 2024

noklam commented Feb 14, 2024 •

edited

Loading

kasperjanehag commented Feb 14, 2024

noklam commented Feb 14, 2024 •

edited

Loading

kasperjanehag commented Feb 16, 2024

noklam commented Feb 16, 2024

julnow commented Mar 6, 2024 •

edited

Loading

kasperjanehag commented Mar 6, 2024

This issue was moved to a discussion.

This issue was moved to a discussion.

Extend factories syntax to other config files? #3086

Extend factories syntax to other config files? #3086

Comments

astrojuanlu commented Sep 27, 2023

Description

Context

Possible Implementation

Possible Alternatives

ankatiyar commented Nov 30, 2023

felipemonroy commented Feb 2, 2024 • edited Loading

inigohidalgo commented Feb 7, 2024

kasperjanehag commented Feb 14, 2024

noklam commented Feb 14, 2024 • edited Loading

kasperjanehag commented Feb 14, 2024

noklam commented Feb 14, 2024 • edited Loading

kasperjanehag commented Feb 16, 2024

noklam commented Feb 16, 2024

julnow commented Mar 6, 2024 • edited Loading

kasperjanehag commented Mar 6, 2024

This issue was moved to a discussion.

felipemonroy commented Feb 2, 2024 •

edited

Loading

noklam commented Feb 14, 2024 •

edited

Loading

noklam commented Feb 14, 2024 •

edited

Loading

julnow commented Mar 6, 2024 •

edited

Loading