Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize data to lib/data for easier contributing #29942

Closed
rarkins opened this issue Jun 29, 2024 · 17 comments · Fixed by #32151
Closed

Centralize data to lib/data for easier contributing #29942

rarkins opened this issue Jun 29, 2024 · 17 comments · Fixed by #32151
Assignees
Labels
priority-2-high Bugs impacting wide number of users or very important features type:refactor Refactoring or improving of existing code

Comments

@rarkins
Copy link
Collaborator

rarkins commented Jun 29, 2024

Describe the proposed change(s).

We have a lot of "crowdsourced" data in the repo, spread through different locations.

For example:

We should centralize this into lib/data so that it's easier for one-time or occasional contributors to find the right location to edit.

We should keep this folder as "raw" as possible and any wrapper code exists elsewhere. One topic per file. Files should be in .json format.

There should be a readme in the folder which clearly describes what each file/dataset is for.

@rarkins rarkins added priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others type:refactor Refactoring or improving of existing code labels Jun 29, 2024
@RahulGautamSingh

This comment was marked as resolved.

@rarkins

This comment was marked as resolved.

@rarkins
Copy link
Collaborator Author

rarkins commented Jul 1, 2024

I added a requirement that the data be in .json format. @viceice is that still ok for our build process?

@viceice
Copy link
Member

viceice commented Jul 1, 2024

should work, maybe use jsonc to allow comments?

@rarkins
Copy link
Collaborator Author

rarkins commented Jul 1, 2024

I'm worried that jsonc isn't easily parseable by other ecosystems. For example we may internalize these in Mend systems using a Java backend

@rarkins rarkins added priority-2-high Bugs impacting wide number of users or very important features and removed priority-3-medium Default priority, "should be done" but isn't prioritised ahead of others labels Jul 1, 2024
@HonkingGoose
Copy link
Collaborator

When you're moving and renaming files, remember to update the "edit button links/overrides".

The published docs have a edit button which takes you to the file on GitHub. For some files the default path assumption of the edit button tool is wrong, so we have manual overrides. I remember we have overrides for things like the Renovate preset source files, and the readme.md files for the Renovate managers, and so on.

@RahulGautamSingh
Copy link
Collaborator

RahulGautamSingh commented Jul 9, 2024

I don't think the metatda-manual for the packages have been documented yet.

How should we go about documenting this? Should they be included in the Included Presets section like the rest of the presets?

@RahulGautamSingh
Copy link
Collaborator

RahulGautamSingh commented Jul 9, 2024

@HonkingGoose can you review the first draft of the readme file? I am trying to gauge what information should be included for each file. Currently I have added preset description, why the preset is needed and how preset is organized or new one is added. For eg. monorepos are organized based on sourceUrls and packagePatterns.

Readme
The `lib/data` folder houses a collection of crowdsourced data files (presets) that are useful for various automated actions.
Such as, grouping related packages using monorepo presets, replacing renamed packages using the replacements presets or using the manual sourceUrl and changelogUrls to provide changelog urls for the packages which do not include them in their api repsonse.

Below, you'll find detailed information on each file contained in this folder:

1. `monorepo.json`
The monorepo.json file houses all the monorepo presets. These presets are used to group related packages together.
The reason why package might be related differs from user-to-user but generally it is done because the packages depend on each other or the they are located in the same location (repo or org).

We currently support three methods for grouping packages:

`repoGroups`: Groups packages based on their source repository URLs.

`orgGroups`: Groups packages based on their organization URLs.

`patternGroups`: Groups packages based on their package names.

@HonkingGoose
Copy link
Collaborator

HonkingGoose commented Jul 9, 2024

Hi @RahulGautamSingh

I improved and expanded your draft. The readme is easier to read and has more information now.

Can you please do these todos?

  • Expand the "summary" table so it has each file in the lib/data folder.
  • Update the .json filenames.
  • Add any missing information.
  • Fix any technical errors.
  • Explain how to use repoGroups, orgGroups and patternGroups. I don't see them in the Renovate docs.
First draft from HonkingGoose

# Introduction

The `lib/data` folder has all our crowdsourced data files.
This readme explains what each file is used for.

## Summary

| File                                    | What is the file about?                  |
| --------------------------------------- | ---------------------------------------- |
| `monorepo.json`                         | Group related packages into a single PR. |
| `filename-for-replacement-presets.json` | Rename old packages to new replacement.  |
| `filename-for-changelogs.json`          | Tell Renovate where to find changelogs.  |

## Group related packages (`monorepo.json`)

The `monorepo.json` file has all the monorepo presets.

Monorepo presets group related packages, so they are updated with a single Renovate PR.

We usually group packages that:

- depend on each other, or
- are in the same repository, or
- are in the same organization

### Ways to group packages

There are three ways to group packages:

| I want to group based on | Method          |
| ------------------------ | --------------- |
| Source repository URLs   | `repoGroups`    |
| Organization URls        | `orgGroups`     |
| Package name(s)          | `patternGroups` |

## Rename old packages

The `filename-for-replacement-presets.json` file has all the replacement presets.

When a package gets renamed, you need to tell Renovate:

- the old package name
- the new package name
- add anything I'm forgetting to list here

## Tell Renovate where to find changelogs

The `filename-for-changelogs.json` has all the changelog information.

Renovate nearly always finds, and displays, the changelog for a package update automatically.

To find the changelog, Renovate needs the:

- URL to the changelog file
- URL to the source

Usually, the API for the package to be updated gives Renovate the correct info.
If this does not happen, for whatever reason, Renovate can not show the changelog.

You can use these config options to let Renovate find the correct changelog:

- [`sourceUrl`](https://docs.renovatebot.com/configuration-options/#sourceurl)
- [`changelogUrl`](https://docs.renovatebot.com/configuration-options/#changelogurl)

Read the [Renovate docs, key concepts page for changelogs](https://docs.renovatebot.com/key-concepts/changelogs/) to learn more about how Renovate fetches and displays changelogs.

@HonkingGoose
Copy link
Collaborator

How about:

  • Adding a "developer-commentary:" field to the .json files?
  • Optional: strip out dev commentary in docs build step?

@RahulGautamSingh
Copy link
Collaborator

RahulGautamSingh commented Jul 17, 2024

Explain how to use repoGroups, orgGroups and patternGroups. I don't see them in the Renovate docs.

I think the Way to group packages section you added and a quick glance at the monorepo.json file will be enough for users to figure it out.

Here's an update version of the `readme`
# Introduction

The `lib/data` folder has all our crowdsourced data files.
This readme explains what each file is used for.

## Summary

| File                           | What is the file about?                  |
| ------------------------------ | ---------------------------------------- |
| `monorepo.json`                | Group related packages into a single PR. |
| `replacements.json`            | Rename old packages to new replacement.  |
| `changelogs.json`              | Tell Renovate where to find changelogs.  |
| `source-urls.json`             | Tell Renovate the source URL of packages.|

## Group related packages (`monorepo.json`)

The `monorepo.json` file has all the monorepo presets.

Monorepo presets group related packages, so they are updated with a single Renovate PR.

### Ways to group packages

There are three ways to group packages:

| Grouping Criteria        | Method          |
| ------------------------ | --------------- |
| Source repository URLs   | `repoGroups`    |
| Organization URls        | `orgGroups`     |
| Package name patterns(s) | `patternGroups` |

Each method allows you to group related packages based on different criteria:

`repoGroups`: Group packages from the same source repository.
`orgGroups`: Group packages from the same organization.
`patternGroups`: Group packages based on name patterns or prefixes.

## Rename old packages (`replacements.json`)

The `replacements.json` file has all the replacement presets.

When a package gets renamed, you need to tell Renovate:

- the datasource of the package
- the old package name
- the new package name
- the last version available for the old package name
- the first version available for the new package name

## Tell Renovate where to find changelogs (`changelog-urls.json`)

The `changelog-urls.json` has all the changelog information.

Renovate nearly always finds, and displays, the changelog for a package update automatically.

To find the changelog, Renovate needs the:

- Name of the package
- URL to the changelog file

Usually, the API for the package to be updated gives Renovate the correct info.
If this does not happen, for whatever reason, Renovate can not show the changelog.

You can use these config options to let Renovate find the correct changelog:

- [`changelogUrl`](https://docs.renovatebot.com/configuration-options/#changelogurl)

Read the [Renovate docs, key concepts page for changelogs](https://docs.renovatebot.com/key-concepts/changelogs/) to learn more about how Renovate fetches and displays changelogs.

## Tell Renovate where to find source urls (`source-urls.json`)

The `source-urls.json` has the infromation on source URL of multiple packages.

Renovate nearly always finds, and displays, the source for a package update automatically.
Usually, the API for the package to be updated gives Renovate the correct info.
If this does not happen, for whatever reason, Renovate can not link to the source of the package and might not be able to lookup changelogs.

To find the source URL, Renovate needs the:

- Name of the package
- URL to the source

To verify if Renovate can find source URLs for your package:

1. Identify the datasource your package uses.
2. Check the documentation page for that specific datasource.
3. Look for a table in the docs that indicates whether the datasource returns source URLs.

You can use these config options to let Renovate find the correct source URL:

- [`sourceUrl`](https://docs.renovatebot.com/configuration-options/#sourceurl)

I have divided the metadata-manual info into 2 json files: changelog-urls.json & source-urls.json for better readability & navigation.

Also, regarding the metedata-manual files. This info is not documented, yet. Should it be documented in the Included Presets section?

@HonkingGoose
Copy link
Collaborator

Answers to your questions

I have divided the metadata-manual info into 2 json files: changelog-urls.json & source-urls.json for better readability & navigation.

Good! Please make sure the filenames in the readme are correct!

Also, regarding the metadata-manual files. This info is not documented, yet. Should it be documented in the Included Presets section?

It should at least be documented somewhere. I don't know the best place, so for now put it in the Included Presets section. 😉

Todos

Can you please make these changes?

  • Put info in table, or make bulleted list
  • Fix typo
  • Rewrite section

Put info in table, or make bulleted list

There are three ways to group packages:

Grouping Criteria Method
Source repository URLs repoGroups
Organization URls orgGroups
Package name patterns(s) patternGroups

Each method allows you to group related packages based on different criteria:

repoGroups: Group packages from the same source repository.
orgGroups: Group packages from the same organization.
patternGroups: Group packages based on name patterns or prefixes.

Please move the explanation into the table. But if that would make the table too big: make a bulleted list for the items. 😉

Fix typo

Please fix the typo: change infromation to information.

Rewrite section

Change this:

To verify if Renovate can find source URLs for your package:

  1. Identify the datasource your package uses.
  2. Check the documentation page for that specific datasource.
  3. Look for a table in the docs that indicates whether the datasource returns source URLs.

You can use these config options to let Renovate find the correct source URL:

Into this:

To check if Renovate can find the source URLs for your package:

1. Find the datasource for your package.
1. Read the Renovate docs for the datasource.
1. Look for a table in the docs that shows if the datasource returns source URLs.

If Renovate does not find the right source URls automatically: use the [`sourceUrl` config option](https://docs.renovatebot.com/configuration-options/#sourceurl).

@RahulGautamSingh
Copy link
Collaborator

I have created PRs for all files mentioned in the description. How should we proceed further?

@viceice
Copy link
Member

viceice commented Jul 21, 2024

  1. create JSON Schema for better user ux
  2. validate data files against schema when linting

@rarkins
Copy link
Collaborator Author

rarkins commented Sep 27, 2024

@RahulGautamSingh what's remaining for this one?

@RahulGautamSingh
Copy link
Collaborator

Schemas for the replacements.json, source-urls.json & changelog-urls.json.

@renovate-release
Copy link
Collaborator

🎉 This issue has been resolved in version 38.135.3 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority-2-high Bugs impacting wide number of users or very important features type:refactor Refactoring or improving of existing code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants