Release 0.3.26 (#1373)

# Description Prefix the PR title with the Jira issue number on the form `[CDF-12345]`. Please describe the change you have made. ## Checklist - [ ] Tests added/updated. - [ ] Run Demo Job Locally. - [ ] Documentation updated. - [ ] Changelogs updated in [CHANGELOG.cdf-tk.md](https://github.com/cognitedata/toolkit/blob/main/CHANGELOG.cdf-tk.md). - [ ] Template changelogs updated in [CHANGELOG.templates.md](https://github.com/cognitedata/toolkit/blob/main/CHANGELOG.templates.md). [CDF-12345]: https://cognitedata.atlassian.net/browse/CDF-12345?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
cognitedata · Jan 16, 2025 · 84ca660 · 84ca660
2 parents 6ae46ee + 8a66e3e
commit 84ca660
Show file tree

Hide file tree

Showing 80 changed files with 945 additions and 284 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -1,5 +1,7 @@
 # Description
 
+Prefix the PR title with the Jira issue number on the form `[CDF-12345]`.
+
 Please describe the change you have made.
 
 ## Checklist
@@ -9,6 +11,3 @@ Please describe the change you have made.
 - [ ] Documentation updated.
 - [ ] Changelogs updated in [CHANGELOG.cdf-tk.md](https://github.com/cognitedata/toolkit/blob/main/CHANGELOG.cdf-tk.md).
 - [ ] Template changelogs updated in [CHANGELOG.templates.md](https://github.com/cognitedata/toolkit/blob/main/CHANGELOG.templates.md).
-- [ ] Version bumped.
-  [_version.py](https://github.com/cognitedata/toolkit/blob/main/cognite/cognite_toolkit/_version.py) and
-  [pyproject.toml](https://github.com/cognitedata/toolkit/blob/main/pyproject.toml) per [semantic versioning](https://semver.org/).
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -12,7 +12,7 @@ repos:
           - --fixable=E,W,F,I,T,RUF,TID,UP
           - --target-version=py39
       - id: ruff-format
-    rev: v0.8.6
+    rev: v0.9.1
 
   - repo: https://github.com/igorshubovych/markdownlint-cli
     rev: v0.43.0

diff --git a/CHANGELOG.cdf-tk.md b/CHANGELOG.cdf-tk.md
@@ -15,6 +15,35 @@ Changes are grouped as follows:
 - `Fixed` for any bug fixes.
 - `Security` in case of vulnerabilities.
 
+## [0.3.26] - 2025-01-16
+
+### Added
+
+- [alpha feature] `cdf import transformation-cli` now has a new flag `--clean` to remove the
+  source files after importing.
+
+### Fixed
+
+- All groups are now correctly deployed before resources that has authentication to them (`Transformation`,
+  `'FunctionSchedule`, `WorkflowTrigger`).
+
+### Changed
+
+- Running `cdf auth init/verify` no longer automatically activates Cognite Functions on private link environments
+  projects.
+
+### Improved
+
+- You now get a warning if you use the `$FILENAME` template incorrectly in the `CogniteFile`/`FileMetadata` resource.
+- If a `{{ variable }}` replacement causes a `YAMLFormatError`, the Toolkit now gives you a hint on how to fix it.
+- If you use a `dataSetId`, the Toolkit now gives you a hint to use `dataSetExternalId` instead.
+- The Toolkit fallback to read any file as `utf-8` if it fails to read.
+- The Toolkit no longer gives `UnusedParamterWarning` for `WorkflowVersion` using a `subworkflow` task.
+- If a `Transformation`/`FunctionSchedule`/`WorkflowTrigger` fails to deploy due to environment variables missing,
+  the Toolkit now gives a hint on how to fix it.
+- [alpha feature] If you get a duplicated item due to using the `repeated-module` feature. The Toolkit now gives
+  you a hint on how to fix it.
+
 ## [0.3.25] - 2025-01-10
 
 ### Added

diff --git a/CHANGELOG.templates.md b/CHANGELOG.templates.md
@@ -15,6 +15,10 @@ Changes are grouped as follows:
 - `Fixed` for any bug fixes.
 - `Security` in case of vulnerabilities.
 
+## [0.3.26] - 2025-01-16
+
+No changes to templates.
+
 ## [0.3.25] - 2025-01-10
 
 No changes to templates.

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -2,22 +2,65 @@
 
 ## How to contribute
 
-We are always looking for ways to improve the templates and the workflow. You can
-[file bugs](https://github.com/cognitedata/toolkit/issues/new/choose) in the repo.
+We are always looking for ways to improve the Cognite Toolkit CLI. You can
+report bugs and ask questions in [our Cognite Hub group](https://hub.cognite.com/groups/cognite-data-fusion-toolkit-277).
 
-We are also looking for contributions to new modules, especially example modules can be very
-useful for others. Please open a PR with your suggested changes or propose a functionality
-by creating an issue.
+We are also looking for contributions to new modules (content) and the Toolkit codebase that make the configuration of
+Cognite Data Fusion easier, faster and more reliable.
 
-## Module ownership
+## Improving the codebase
+
+If you want to contribute to the codebase, you can do so by creating a new branch and
+[opening a pull request](https://github.com/cognitedata/toolkit/compare). Prefix the PR title with the Jira issue
+number on the form `[CDF-12345]`. A good PR should include a good description of the change to help the reviewer
+understand the nature and context of the change.
+
+### Linting and testing
+
+The Cognite Toolkit CLI and modules have an extensive test and linting battery to ensure quality and speed of development.
+
+See [pyproject.toml](pyproject.toml) for the linting and testing configuration.
+
+See [tests](tests/README.md) for more information on how to run and maintain tests.
+
+The `cdf_` prefixed modules are tested as part of the product development.
+
+### Setting up the local environment
+
+Your local environment needs a working Python installation and a virtual environment. We use `poetry` to manage
+the environment and its dependencies.
+
+Install pre-commit hooks by running `poetry run pre-commit install` in the root of the repository.
+
+When developing in vscode, the `cdf-tk-dev.py` file is useful to run the toolkit. This script will set the
+environment and paths correctly (to avoid conflicts with the installed cdf package) and also sets the
+`SENTRY_ENABLED` environment variable to `false` to avoid sending errors to Sentry.
+In .vscode/launch.json you will see a number of examples of debugging configurations that you can use to debug.
+
+### Essential code
+
+- Main app entry point: [cognite_toolkit/_cdf.py](cognite_toolkit/_cdf.py)
+- App subcommands: [cognite_toolkit/_cdf_tk/commands](cognite_toolkit/_cdf_tk/commands)
+- Resource loaders: [cognite_toolkit/_cdf_tk/loaders](cognite_toolkit/_cdf_tk/loaders)
+- Tests: [tests](tests)
+- CI/CD: [.github/workflows](.github/workflows)
+
+### Sentry
+
+When you develop the Cognite Toolkit you should avoid sending errors to  `sentry`. You can control `sentry` by setting
+the  `environment` variable `SENTRY_ENABLED=false`. This is set automatically when you use the `cdf-tk-dev.py`.
+
+## Contributing in modules
+
+### Module ownership
 
 The official cdf_* modules are owned by the respective teams in Cognite. Any changes to these
 will be reviewed by the teams to ensure that nothing breaks. If you open a PR on these modules,
 the PR will be reviewed by the team owning the module.
 
 cdf_infield_location is an example of a team-owned module.
 
-## Adding a new module
+### Adding a new module
 
 Adding a new module consists of the following steps:
 
@@ -70,37 +113,6 @@ Of course, where data population of e.g. data model is part of the configuration
 The scripts are continuously under development to simplify management of configurations, and
 we are pushing the functionality into the Python SDK when that makes sense.
 
-## Testing
-
-The `cdf_` prefixed modules should be tested as part of the product development. Our internal
-test framework for scenario based testing can be found in the Cognite private big-smoke repository.
-
-The `cdf-tk deploy` script command will clean configurations if you specify `--drop`, so you can
-try to apply the configuration multiple times without having to clean up manually. If you want to delete
-everything that is governed by your templates, including data ingested into data models, the  `cdf-tk clean`
-script command can be used to clean up configurations using the `scripts/delete.py` functions.
-
-See [tests](tests/README.md) for more information on how to run tests.
-
-## Setting up Environment
-
-In order to develop `cdf-tk` you need to set up a development environment. You need a working python
-installation and a virtual environment. We recommend using `poetry` to set up the environment as this is
-the package tool that the toolkit repo uses also to create the installable python package.
-
-When developing, you should use `cdf-tk-dev.py` to run the toolkit. This script will set the environment and paths
-correctly (to avoid running the installed cdf-tk package) and also set the `SENTRY_ENABLED` environment
-variable to `false` to avoid sending errors to Sentry.
-In .vscode/launch.json you will see a number of examples of debugging configurations that you can use to debug.
-If you use VSCode or another IDE supporting devcontainers, the easiest way to set up the environment is to
-run in the Dev Container as configured in .devcontainer. It creates a virtual python environment in .venv/ that
-will automatically be picked up by VSCode or poetry also if you want to run outside the devcontainer.
-
-### Sentry
-
-When you develop `cdf-tk` you should avoid sending errors to  `sentry`. You can control `sentry` by setting
-the  `environment` variable `SENTRY_ENABLED=false`. This is set automatically when you use the `cdf-tk-dev.py`.
-
 ## Releasing
 
 The templates are bundled with the `cdf-tk` tool, so they are released together.
@@ -132,12 +144,12 @@ To release a new version of the `cdf-tk` tool and the templates, you need to do
       - deactivate
       - run script again
 
-1. Get approval to squash merge the branch into `main`:
+1. Get approval to **squash merge** the branch into `main`:
    1. Verify that all Github actions pass.
 1. Create a release branch: `release-x.y.z` from `main`:
    1. Create a new tag on the branch with the version number, e.g. `v0.1.0b3`.
    2. Open a PR with the existing `release` branch as base comparing to your new `release-x.y.z` branch.
-   3. Get approval and merge (do not squash).
+   3. Get approval and merge (**do not squash**).
    4. Verify that the Github action `release` passes and pushes to PyPi.
 1. Create a new release on github.com with the tag and release notes:
    1. Find the tag you created and create the new release.

diff --git a/cdf.toml b/cdf.toml
@@ -29,4 +29,4 @@ dump = true
 [modules]
 # This is the version of the modules. It should not be changed manually.
 # It will be updated by the 'cdf module upgrade' command.
-version = "0.3.25"
+version = "0.3.26"
diff --git a/cognite_toolkit/_builtin_modules/cdf.toml b/cognite_toolkit/_builtin_modules/cdf.toml
@@ -4,7 +4,7 @@ default_env = "<DEFAULT_ENV_PLACEHOLDER>"
 [modules]
 # This is the version of the modules. It should not be changed manually.
 # It will be updated by the 'cdf module upgrade' command.
-version = "0.3.25"
+version = "0.3.26"
 
 
 [plugins]

diff --git a/cognite_toolkit/_cdf_tk/builders/_base.py b/cognite_toolkit/_cdf_tk/builders/_base.py
@@ -31,6 +31,7 @@
 )
 from cognite_toolkit._cdf_tk.utils import (
     humanize_collection,
+    safe_read,
 )
 
 
@@ -141,7 +142,7 @@ def get_loader(
         # If there is a tableName field, it is a table, otherwise it is a database.
         if any(
             line.strip().startswith("tableName:") or line.strip().startswith("- tableName:")
-            for line in source_path.read_text().splitlines()
+            for line in safe_read(source_path).splitlines()
         ):
             return RawTableLoader, None
         else:

diff --git a/cognite_toolkit/_cdf_tk/builders/_datamodels.py b/cognite_toolkit/_cdf_tk/builders/_datamodels.py
@@ -71,7 +71,7 @@ def _copy_graphql_to_build(
             if "dml" in entry:
                 expected_filename = entry["dml"]
             else:
-                expected_filename = f'{INDEX_PATTERN.sub("", source_file.source.path.stem.removesuffix(GraphQLLoader.kind).removesuffix("."))}.graphql'
+                expected_filename = f"{INDEX_PATTERN.sub('', source_file.source.path.stem.removesuffix(GraphQLLoader.kind).removesuffix('.'))}.graphql"
             expected_path = source_file.source.path.parent / Path(expected_filename)
 
             if expected_path in graphql_files:

diff --git a/cognite_toolkit/_cdf_tk/builders/_file.py b/cognite_toolkit/_cdf_tk/builders/_file.py
@@ -10,7 +10,7 @@
 )
 from cognite_toolkit._cdf_tk.exceptions import ToolkitYAMLFormatError
 from cognite_toolkit._cdf_tk.loaders import CogniteFileLoader, FileLoader, FileMetadataLoader
-from cognite_toolkit._cdf_tk.tk_warnings import ToolkitWarning
+from cognite_toolkit._cdf_tk.tk_warnings import LowSeverityWarning, ToolkitWarning
 
 
 class FileBuilder(Builder):
@@ -55,6 +55,16 @@ def _expand_file_metadata(
             and cls.template_pattern in raw_list[0].get("externalId", "")
         )
         if not is_file_template:
+            if (isinstance(raw_list, dict) and cls.template_pattern in raw_list.get("externalId", "")) or (
+                isinstance(raw_list, list)
+                and any(cls.template_pattern in entry.get("externalId", "") for entry in raw_list)
+            ):
+                raw_type = "dictionary" if isinstance(raw_list, dict) else "list with multiple entries"
+                LowSeverityWarning(
+                    f"Invalid file template {cls.template_pattern!r} usage detected in {module.relative_path.as_posix()!r}.\n"
+                    f"The file template is expected in a list with a single entry, but got {raw_type}."
+                ).print_warning()
+
             return raw_list
         if not (isinstance(raw_list, list) and raw_list and isinstance(raw_list[0], dict)):
             raise ToolkitYAMLFormatError(

diff --git a/cognite_toolkit/_cdf_tk/builders/_transformation.py b/cognite_toolkit/_cdf_tk/builders/_transformation.py
@@ -12,6 +12,7 @@
 from cognite_toolkit._cdf_tk.exceptions import ToolkitYAMLFormatError
 from cognite_toolkit._cdf_tk.loaders import TransformationLoader
 from cognite_toolkit._cdf_tk.tk_warnings import ToolkitWarning
+from cognite_toolkit._cdf_tk.utils import safe_write
 
 
 class TransformationBuilder(Builder):
@@ -85,7 +86,7 @@ def _add_query(
                 )
             elif query_file is not None:
                 destination_path = self._create_destination_path(query_file.source.path, "Query")
-                destination_path.write_text(query_file.content)
+                safe_write(destination_path, query_file.content)
                 relative = destination_path.relative_to(transformation_destination_path.parent)
                 entry["queryFile"] = relative.as_posix()
                 extra_sources.append(query_file.source)

diff --git a/cognite_toolkit/_cdf_tk/cdf_toml.py b/cognite_toolkit/_cdf_tk/cdf_toml.py
@@ -97,8 +97,7 @@ def load(cls, cwd: Path | None = None, use_singleton: bool = True) -> CDFToml:
                 alpha_flags = {clean_name(k): v for k, v in raw["alpha_flags"].items()}
             if not alpha_flags and "feature_flags" in raw:
                 MediumSeverityWarning(
-                    "The 'feature_flags' section has been renamed to 'alpha_flags'. "
-                    "Please update your cdf.toml file."
+                    "The 'feature_flags' section has been renamed to 'alpha_flags'. Please update your cdf.toml file."
                 ).print_warning()
                 alpha_flags = {clean_name(k): v for k, v in raw["feature_flags"].items()}
 

diff --git a/cognite_toolkit/_cdf_tk/client/_toolkit_client.py b/cognite_toolkit/_cdf_tk/client/_toolkit_client.py
@@ -31,6 +31,13 @@ def cloud_provider(self) -> Literal["azure", "aws", "gcp", "unknown"]:
         else:
             return "unknown"
 
+    @property
+    def is_private_link(self) -> bool:
+        if "cognitedata.com" not in self.base_url:
+            return False
+        subdomain = self.base_url.split("cognitedata.com", maxsplit=1)[0]
+        return "plink" in subdomain
+
 
 class ToolkitClient(CogniteClient):
     def __init__(self, config: ToolkitClientConfig | None = None) -> None:

diff --git a/cognite_toolkit/_cdf_tk/client/api/lookup.py b/cognite_toolkit/_cdf_tk/client/api/lookup.py
@@ -69,7 +69,7 @@ def id(
             self._reverse_cache.update({v: k for k, v in lookup.items()})
             if len(missing) != len(lookup) and not is_dry_run:
                 raise ResourceRetrievalError(
-                    f"Failed to retrieve {self.resource_name} with external_id {missing}." "Have you created it?"
+                    f"Failed to retrieve {self.resource_name} with external_id {missing}.Have you created it?"
                 )
         return (
             self._get_id_from_cache(external_id, is_dry_run, allow_empty)
@@ -116,7 +116,7 @@ def external_id(
             self._cache.update({v: k for k, v in lookup.items()})
             if len(missing) != len(lookup):
                 raise ResourceRetrievalError(
-                    f"Failed to retrieve {self.resource_name} with id {missing}." "Have you created it?"
+                    f"Failed to retrieve {self.resource_name} with id {missing}.Have you created it?"
                 )
         return (
             self._get_external_id_from_cache(id)

diff --git a/cognite_toolkit/_cdf_tk/client/data_classes/sequences.py b/cognite_toolkit/_cdf_tk/client/data_classes/sequences.py
@@ -43,7 +43,7 @@ def __init__(
         col_length = len(columns)
         if wrong_length := [r for r in rows if len(r.values) != col_length]:
             raise ValueError(
-                f"Rows { [r.row_number for r in wrong_length] } have wrong number of values, expected {col_length}"
+                f"Rows {[r.row_number for r in wrong_length]} have wrong number of values, expected {col_length}"
             )
         self.rows = rows
         self.columns = columns
@@ -108,7 +108,7 @@ def __init__(
         col_length = len(columns)
         if wrong_length := [r for r in rows if len(r.values) != col_length]:
             raise ValueError(
-                f"Rows { [r.row_number for r in wrong_length] } have wrong number of values, expected {col_length}"
+                f"Rows {[r.row_number for r in wrong_length]} have wrong number of values, expected {col_length}"
             )
         self.rows = rows
         self.columns = columns