Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make generate_release_notes.py much faster #5001

Closed
fingolfin opened this issue Aug 17, 2022 · 3 comments · Fixed by #5613
Closed

Make generate_release_notes.py much faster #5001

fingolfin opened this issue Aug 17, 2022 · 3 comments · Fixed by #5613

Comments

@fingolfin
Copy link
Member

fingolfin commented Aug 17, 2022

Right now generate_release_notes.py takes half an hour or so to query the 192 relevant PRs from the GitHub website.

That's really bad, and I am sure we can do better. Indeed using the gh command line tool I can easily execute the relevant query in about 5 seconds, producing JSON that's not so far from what we need. Note that unlike our existing script, I am using the merged filter / mergedAt property (instead of closed / closedAt) which reduces the number of matches server side, and then I also tell GitHub to filter out anything with a certain label. This alone is not explaining the several orders of magnitude difference in performance, but they do contribute.

gh pr list --search 'merged:>=2019-09-09 -label:"release notes: not needed"' --json number,title,closedAt,labels,mergedAt --limit 200

(Actually, I first run this with a smaller limit to find out how many matches there are, then set the limit high enough to get all of them).

@fingolfin
Copy link
Member Author

The following Python code converts the output of the above gh command into the format used by

#!/usr/bin/env python3
import json

def main():
    with open("prscache-gh.json", "r") as read_file:
        prs = json.load(read_file)

    new_prs = dict()
    for pr in prs:
        new_prs[str(pr["number"])] = {
            "title": pr["title"],
            "closed_at": pr["closedAt"],
            #"merged_at": pr["mergedAt"],
            "labels": [x["name"] for x in pr["labels"]],
        }

    with open("prscache.json", "w", encoding="utf-8") as f:
        json.dump(new_prs, f, ensure_ascii=False, indent=4)


if __name__ == "__main__":
    main()

@fingolfin
Copy link
Member Author

We can do this ourselves using GraphQL. Here is a basic query that essentially gets the data we want (I am sure it could be improved further):

{
  search(
    query: "repo:gap-system/gap merged:>=2019-09-09 -label:'release notes: not needed'"
    type: ISSUE
    last: 100
  ) {
    issueCount
    edges {
      node {
        ... on PullRequest {
          title
          number
          createdAt
          mergedAt
          labels(first: 10) {
            nodes {
              name
            }
          }
        }
      }
    }
  }
}

One can experiment with it interactively via https://docs.github.com/en/graphql/overview/explorer

Here are some ways how one can easily post this from Python: https://stackoverflow.com/questions/45957784/. Of course some more work is required to properly deal with pagination; and perhaps we also will want to (need to) use a GitHub token. But it's a good start I think.

@fingolfin
Copy link
Member Author

For minor releases, we want a slightly different query: we also want to require label:"backport-to-4.12-DONE".

Also the start date can be computed from the previous tag, which is either e.g. v4.12.0 for minor release (here: v4.12.1) or for a major release like v4.12.0 the tag to use is the v4.13dev tag (or alternatively: git merge-base stable-4.12 master).

All these could be derived from the new version: i.e. it should suffice to say something like release 4.12.1 and it should automatically be able to determine all relevant tags, dates, queries. etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant