Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic manager that understands json and yaml structure. #15193

Open
dariocc opened this issue Apr 20, 2022 · 26 comments
Open

Generic manager that understands json and yaml structure. #15193

dariocc opened this issue Apr 20, 2022 · 26 comments
Labels
priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:feature Feature (new functionality)

Comments

@dariocc
Copy link

dariocc commented Apr 20, 2022

What would you like Renovate to be able to do?

We are using Renovate at work and we make use of the power of the regex manager to support custom configuration formats where we store versioning information of pinned dependencies.

We've realized however that many of our configuration formats are JSON or Yaml and that we'd benefit of having language to extract dependencies that understands the structure of those documents, as regex get tricky to reason about once written and maintained.

We'd like to have a manager that is similar in concept and configuration to the regex manager but where instead of using regex to match dependencies it uses expressions that can navigate objects resulting from loading json and yaml.

If you have any ideas on how this should be implemented, please tell us here.

We have already implemented a solution to this problem that we call JSON manager. Usage & configuration is analogous to regexManager but we replace the matchStrings with a matchQueries:

{
  "jsonManagers": [
    {
      "fileMatch": ["<file match pattern>"],
      "matchQueries": ['<query>'],
      ... all regex manager configuration options are supported
    }
  ]
}

The matchQueries is a collection of expressions whose goal is to transform input files into a JSON object with the following fields:

{ 
    "depName": "<dep-name>",
    "packageName": "<package-name>",
    "currentValue": "<current-value>",
    "currentDigest": "<current-digest>",
    "datasource": "<data-source>",
    "versioning": "<versioning>",
    "extractVersion": "<extract-version>",
    "registryUrl": "<registry-url>",
    "depType": "<dep-type>"
}

We chose JSONata expressions for the query language after evaluating a number of options. We wouldn't mind a different query language if that is preferred by the Renovate team.

Example

For the following file:

{
  "production": [
    {
      "version": "1.2.3",
      "package": "foo"
    }
  ],
  "development": [
    {
      "version": "4.5.6",
      "package": "bar"
    }
  ]
}

We could easily update production dependencies with the following query:

production.{ "depName": package, "currentValue": version }

Are we interested in implementing the feature ourselves?

Partially yes: We have already implemented the feature and we'd be happy to contribute it back to Renovate but only if there is some initial interest which what we are trying to asses by creating this issue.

Because we try to keep our Renovate fork in sync with official Renovate we've introduced our changes favoring minimizing merge conflicts and traded off code duplication. We've also introduced comments to guard ourselves and assist or sync efforts and have kept additional *.spec.ts files instead of modifying Renovate defaults.

In order to contribute it to your project we'd need to do some clean-up which is only worth if you'll give a serious consideration to our contribution. We don't also mind some additional modifications for alignment with the project.

We would not be addressing duplication existing between regexManager and jsonManager concept, since that is likely an architectural decision that it is out of the scope of our implementation. We would not mind if you prefer to use a different query language than JSONata but we probably don't want to spend additional time setting up it (there are only a couple of lines of code though were JSONata is used).

We've made sure to cover the added functionality with unit-tests and are able to commit to the code-coverage settings you've configured.

What changes did we require?

Apart from the obvious addition of the manager implementation:

  • Changes to configuration validation code. We conditionally execute custom validation for a jsonManagers similarly to how you deal with regexManagers. However we also had to change optionParents to be an associative collection between option keys and a list of parents (since regexManagers & jsonManagers can both be parents to the same options).
  • Addition of a JSONManager to config/types. This invalidates a bit the notion of a custom manager given that a JSONManager is also a custom manager in a sense.
  • Some minor changes were there were there existed conditional code to deal with regex manager. We do similarly for handling json manager.

Is this a feature you are interested in implementing yourself?

Maybe

@dariocc dariocc added priority-5-triage status:requirements Full requirements are not yet known, so implementation should not be started type:feature Feature (new functionality) labels Apr 20, 2022
@viceice
Copy link
Member

viceice commented Apr 20, 2022

please search existing issues, I'm pretty sure this is a duplicate.

@rarkins
Copy link
Collaborator

rarkins commented Apr 20, 2022

In the meantime, I'd like to say that I love the idea and this is definitely something we've wanted to do. I hadn't seen JSONata but it looks good at first glance.

I think it could be a good idea to separate the concept of:

  • package file data (e.g. would could include common settings like datasource, versioning, and registryUrl(s)). This would be optional
  • dependency data (i.e. a list of extracted dependencies)

I don't think it's essential to adopt semantics of the regex manager if there's better ways, although of course it's convenient if so. Does your approach support "nested" queries, e.g. you can search inside multiple levels?

I think a good test of your implementation would be if you could replicate basic package.json extraction capabilities such as depType=dependency/devDependency/etc.

@dariocc
Copy link
Author

dariocc commented Apr 20, 2022

Does your approach support "nested" queries, e.g. you can search inside multiple levels?

Nested queries in the sense applied by regexManager aren't necessary if you have a document-aware language such as JSONata where you can transform JSON data into the dependency data you need. Meaning: the language itself gives you the ability to apply consecutive transformations to your input document. You don't need to use handle the recursive strategy yourselves.

I love the idea and this is definitely something we've wanted to do.

I'm happy to dedicate some time to create an upstreamable version of our implementation so as you can see it for real. As I mentioned, I only need to know that there is at least some initial interest.

I don't think it's essential to adopt semantics of the regex manager if there's better ways

I don't know if there are better semantics, but I haven't found any limitation those of the regex manager. It is very flexible and applicable to the JSON manager that I'm suggesting with the exception of the match strategy (not really valuable):

export interface JSONManager {
  fileMatch: string[];
  matchQueries: string[];
  depNameTemplate?: string;
  datasourceTemplate?: string;
  packageNameTemplate?: string;
  versioningTemplate?: string;
  autoReplaceStringTemplate?: string;
}

I think conceptually they are very close: the regex manager focuses on extracting dependencies from any text. The json manager focuses on extracting dependencies from a structured document.

In fact, the notion of a regexManager and JSONManager could be generalized to the notion of a custom manager (you even use this name in contif/types.ts) with different dependency extraction languages and language specific options.

I think a good test of your implementation would be if you could replicate basic package.json extraction capabilities such as depType=dependency/devDependency/etc.

While the idea is not to replace the npm manager you can certainly find a JSONata expression that does what you want. Most of our own custom configuration file formats are fairly simple... often it is about selecting specific nested configuration values.

Depending on how you balance readability vs conciseness and avoiding duplication you may decide to have one or several jsonManagers

Example, you could have multiple matchQueries, one per dependency type:

dependencies.$each(function($version, $dependency) { { "depName": $substringBefore($dependency, "/"), "currentValue": $version , "depType": "dependencies"} })`

... similar for devDependencies and others

Or you may be able to resolve that in a single JSONata query:

(
$extract := function ($deps, $depType) {
  $deps.$each(
    function($version, $dependency) { 
      { 
        "depName": $substringBefore($dependency, "/"), 
        "currentValue": $version , 
        "depType": $depType
      } 
    })
};

$sift(function($_, $k) { $k ~> /dependencies|devDependencies|optionalDependencies|peerDependencies/} ).$each( function($v,$k) { $extract($v, $k) }).$
)

You can try this yourself by copy&pasting this into https://try.jsonata.org/ and use it on a package.json file of your choice.

However, notice that the choice of JSONata is just an implementation detail and that I'm no expert in writing these expressions.

@dariocc
Copy link
Author

dariocc commented Apr 20, 2022

please search existing issues, I'm pretty sure this is a duplicate.

Could you suggest a query to find such duplicate?

When I tried before creating this issue I wasn't able to: "json" produced too many results, jq or jsonata don't give any useful result, generic json didn't either.

So I can only apologize for the noise.

@rarkins
Copy link
Collaborator

rarkins commented Apr 20, 2022

I'm not sure we ever wrote up the idea in an issue can't find it now. There was the beginnings of a similar idea for datasources: #6223

@viceice
Copy link
Member

viceice commented Apr 20, 2022

please search existing issues, I'm pretty sure this is a duplicate.

Could you suggest a query to find such duplicate?

When I tried before creating this issue I wasn't able to: "json" produced too many results, jq or jsonata don't give any useful result, generic json didn't either.

So I can only apologize for the noise.

couldn't find it either, was probably only a slack or GitHub discussion 🤷‍♂️

@viceice
Copy link
Member

viceice commented Apr 20, 2022

jsonata:

we need to be sure it can't load any additional code from global objects or external files.

i can see you can define some kind of functions, that's why I'm a little concerned.

@dariocc
Copy link
Author

dariocc commented Apr 20, 2022

I don't fully understand the relation of being able to write functions to being able to load code from external files however I do understand you want to be certain that this won't introduce security vulnerabilities.

I was able to find https://snyk.io/vuln/npm:jsonata, in case it is of any help.

The JSONata language itself doesn't provide any specific language feature to load information from files that I'm aware. At least I didn't find any in https://docs.jsonata.org/overview.

The package doesn't introduce any additional dependency (https://www.npmjs.com/package/jsonata).
To me it sounds it doesn't represent any greater risk than any of the other dependencies Renovate already has.

However remember that the choice of JSONata is a detail, and the idea of a generic json manager is still useful per-se. If there is a different json query language that you would feel more comfortable with I'd be happy to use it instead.

@dariocc
Copy link
Author

dariocc commented Apr 20, 2022

Let me know if you'd be interested in me creating a PR with an example of how a generic JSON manager could look like.
And thanks in any case for Renovate, which is an awesome piece of software :).

@viceice
Copy link
Member

viceice commented Apr 20, 2022

i think a alternative is jsonpath, which is similar to xpath

@viceice
Copy link
Member

viceice commented Apr 20, 2022

Let me know if you'd be interested in me creating a PR with an example of how a generic JSON manager could look like.
And thanks in any case for Renovate, which is an awesome piece of software :).

yes, go ahead. had a quick look at jsonata and it seems safe by default.

@rarkins
Copy link
Collaborator

rarkins commented Apr 20, 2022

The question is whether it opens up too much of an attack surface, intentionally or not. We don't want to play a game of cat and mouse locking down it's functions

@rarkins
Copy link
Collaborator

rarkins commented Apr 20, 2022

I also took a deeper look and jsonata seems OK. It would be nice if you could design this PR in such a way that we could offer jsonpath as an alternative in future if someone wanted to implement it.

@rarkins rarkins added priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others status:ready and removed priority-5-triage status:requirements Full requirements are not yet known, so implementation should not be started labels Apr 20, 2022
@dariocc
Copy link
Author

dariocc commented Apr 20, 2022

The question is whether it opens up too much of an attack surface, intentionally or not

Yes that is understandable.

i think a alternative is jsonpath, which is similar to xpath

I considered initially using jsonpath but in my experimentation I found that I wasn't able to transform a json document into the collection of dependencies, which was the same concept as with the regex manager. If you know this transformation is possible and you can provide an example I'm happy to give it a second try.

What I imagine it may be possible is to have a jsonpath in combination with recursive strategy, similar to what you do with the regex manager, but I believe that relying on a language that can do the transformation vs one that can only select information is more advantageous.

My preference would've been jq given that it is a widely used json expression language, but the libraries I found weren't as used or as mature as JSONata (reference: https://www.npmtrends.com/jq-vs-json-query-vs-jsonata-vs-nools).

It would be nice if you could design this PR in such a way that we could offer jsonpath as an alternative in future if someone wanted to implement it.

Assuming the JSONPath can produce the necessary transformation (it is possible for example with jq) the idea I had in mind to enable such scenario would be to have an additional configuration option that determines the language of the query. Or perhaps the introduction of a customManagers option that can hold different types of "custom managers" (regexManager, jsonataManager, jsonpathManager).

But in any case, let's start with me sharing what we have and then discuss any further on top of it.

@dariocc dariocc changed the title Generic manager that understand json and yaml structure. Generic manager that understands json and yaml structure. Apr 21, 2022
@ChipWolf
Copy link

As a result of supporting JSON it'd be relatively easy to support YAML/TOML etc. can we scope this in?

@dariocc
Copy link
Author

dariocc commented May 23, 2022

It would be a matter to change the deserializer used to load the configuration file.

@rarkins
Copy link
Collaborator

rarkins commented Jul 11, 2022

@dariocc how is this going - anything we can do to help you get this to a PR?

@dariocc
Copy link
Author

dariocc commented Aug 9, 2022

Sorry, I got stuck in some formalities to get approval to contribute to the project and then I had some personal matters to attend. I'll take a look in the coming days. Hoping for an easy merge 🤞.

@dariocc
Copy link
Author

dariocc commented Aug 9, 2022

Feel free to take a look at the #17077, which I left in draft until I have time to set-up a real repo. the readme.md should be useful to understand how it should work. The original code where this work is done has however been tested on a real repo.

@Morl99
Copy link

Morl99 commented May 5, 2023

We had an internal discussion in a Community at Deutsche Bahn and have an interest in this issue, since we have built something based on the RegexManager that is able to fetch generic dependency information out of a JSON File (but with quite a few downsides).

I took a look at the approach in the PR and feel like it would help to cover a lot of use cases. @dariocc would you be willing to pick it up again and work on the few remarks that are still left in the PR?
@rarkins is there anything new to take care of due to the time that has passed since the inital creation of the PR?

@rarkins
Copy link
Collaborator

rarkins commented May 5, 2023

We have some chained dependencies here, where we were hoping to do some renaming/refactoring of the regex manager, but that seemed to get derailed in #19133

We should try to restart that so that the insertion of this manager is cleaner.

But overall I would love to see this feature get landed

@dariocc
Copy link
Author

dariocc commented May 7, 2023

I took a look at the approach in the PR and feel like it would help to cover a lot of use cases. @dariocc would you be willing to pick it up again and work on the few remarks that are still left in the PR?

I don't think it is up to me at this point :).

But great to hear others have also interest in this.

@benedikt-bartscher
Copy link

I am also interested in a generic json/yaml manager. This would allow basic support for projects like https://github.com/kluctl/kluctl in minutes.

@erik-bershel
Copy link

We are working on integrating your product into the automation of version control for software that is installed during the build process of base images for users of Github Actions. And we are also very interested in the implementation of mechanisms for working with json and / or yaml formats.
Perhaps we can help in some way?

@rarkins
Copy link
Collaborator

rarkins commented May 13, 2023

What's needed is:

Possibly it might be easiest to replace #19133, and this would be a welcome PR

@gregoryboue

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:feature Feature (new functionality)
Projects
None yet
Development

No branches or pull requests

9 participants