-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python script to get an overview of changes with PyGithub #4277
Python script to get an overview of changes with PyGithub #4277
Conversation
0a52a5f
to
8b85e9c
Compare
I will rebase this PR and squash some commits - @danielrademacher is this fine with you? You will have to reset your clone after that, if you will need to push new commits. |
dev/release-notes.py
Outdated
removelist = [] | ||
for k in prs: | ||
#if not "release notes: use title" in item[2]: | ||
if not "release notes: added" in prs[k]["labels"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the next release notes we have to use "release notes: use title" instead of "release notes: added" at this point. At the moment we use "release notes: added" to see whether the script works with the current release notes.
Yes, of course. Thanks for telling me :) |
dev/release-notes.py
Outdated
# TODO: for now this will do, use Sergio's code later | ||
with open("/Users/alexk/.github_shell_token", "r") as f: | ||
accessToken=f.read().replace("\n", "") | ||
g=Github(accessToken) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment for anybody using this script:
You have two options to connect with Github:
- Use your username and password
- Use an access token (can be generated in settings -> Developer Settings -> Personal access token)
In any case, use the second variant with the token. If you otherwise push your version of the script, the password is in plaintext on Github.
Does this already take backport labels into account / support patch releases? AS far as I could see, only major release (4.12.0) are supported but not minor/patch (4.11.1) |
@fingolfin not yet. For that purpose, should I simply impose an additional condition for the appropriate backport label? |
8c999c0
to
9263980
Compare
Yes I think we simply filter based on whether the |
dc1439c
to
a4f33f1
Compare
@fingolfin Ok - tried filtering those which are backported, you can see the result in #4257. Will add the logic for those labeled but not yet backported and push tomorrow. Also, rebased this PR and squashed some fixups. |
dev/release-notes.py
Outdated
@@ -0,0 +1,211 @@ | |||
#!/usr/bin/env python3 | |||
# | |||
# Usage: ./release-notes.py YYYY-MM-DD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is redundant, and also not super helpful. Somewhere it should be stated what the YYYY-MM-DD is used for (either here in the comment, or in the usage()
function, but not both).
And I guess you also need an optional argument to indicate whether this is for a minor or major release. There are many ways I can think of how that would look; but probably the easiest to implement is just allow passing in the name of an additional label to be used in queries. If one wants to be a bit more fancy, one can just take a version like 4.11
and then derive the backport-4.11-DONE
label name from that. And of course even the 4.11 could be derived from the version of the GAP in the current directory, so then would just have to specify a flag --backports
or --minor-release
or so
dev/release-notes.py
Outdated
for k in prs: | ||
# The format of an entry of list is: ["title of PR", "Link" (Alternative the PR number can be used), [ list of labels ] ] | ||
if "release notes: highlight" in prs[k]["labels"]: | ||
# TODO: writing up these details should be a function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fingolfin perhaps no need in a function, after it reduced to two lines...
@fingolfin what would the best way to get local copy of changes? Your changes went to the PR branch in the main repo, and @danielrademacher's pushed to a branch in my fork. |
Ah, that's not true - I allowed edits, and all changes are now in my fork. Then I know what to do. |
This script could also generate a big part of the package update news, if we start accompanying the package distribution tarballs with gzipped JSON files containing the metadata, as generated by that script in |
Just some feedback about the script: Another point: |
2b0fd8e
to
187d21e
Compare
cd3498d
to
4617875
Compare
4617875
to
8144905
Compare
We need to use both old "release notes: added" label and the newly introduced in "release notes: use title" label since both label may appear in GAP 4.12.0 changes overview.
This makes unnecessary its special treatment early in the script. Also improved comments and instructions for the unsorted PRs.
This seems quite usable - maybe merge it and continue improvements in next PRs? I compared the output it would produce for 4.11.1 and a couple of discrepancies are all explainable (e.g. something that was already in GAP 4.11.0 and should have been reported before, or PR merged before GAP 4.11.0, but backported after. Also, below is what it produces today for the major release. Release NotesNew features and major changes
Fixed bugs that could lead to incorrect results
Fixed bugs that could lead to crashes
Fixed bugs that could lead to break loops
Other fixed bugs
Improved and extended functionality
New features
Performance improvementsImprovements to the interface which allows 3rd party code to link GAP as a library
Improvements in the support for using the Julia garbage collectorChanged documentationPackagesOther changes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy for this to be merged, with subsequent work being put into separate PRs.
It's a shame that the GitHub API for listing PRs doesn't offer as much filtering power as does the GitHub website, which means that you have to run through all PRs to get the ones you want. e.g. there's no easy way to only get your hands on the PRs in a list like:
That's the main problem with this script, but I also can't see an obvious way round it.
I have just run the script (with both minor
and then major
) on my computer and the output seemed reasonable. It also took about 25 minutes.
(One option for a quicker script would be to scrape the HTML webpage of one of those long queries to get at least some initial data, possibly even the whole list of PR numbers that we want to investigate further. It's obviously somewhat unsupported, however.) |
Concerning @wilfwilson's idea to "scrape the HTML webpage of one of those long queries": |
I have investigated where API queries are spent. My understanding is that you first call We are now going back to the history until September 2019 for the master branch. Had we released more often in the meantime, that would be faster. Maybe in the future with more often major release cycle this will be faster again. OTOH, if at some point the problem will be that we're running out of 5000 calls, one could implement rate limiter, e.g. sleep for 5 second between iterations. Even with that, I will not mind running this for an hour. It's not like dreaded task of collating PRs manually in markdown and then converting it to GAPDoc, even though semiatomatically. |
@wilfwilson btw, both minor and major use the same cache - I think once you did one, you should be able to run the other in seconds. |
@alex-konovalov wrote:
This seems to be incorrect. Yes, the initial "call" to In particular, it seems that getting the PRs via To be clearer, you are correct that this is very cheap, and quick:
But that's because it's not doing anything. Even just choosing a PR is expensive and slow:
takes a couple of seconds. In particular, running the following subset of your script:
has so far over taken 8 minutes, and I have only got from PR 4337 to PR 3200. I estimate, therefore, that the vast majority of the 25 minutes is required just to get the merging date of each PR, and overall, getting the labels of the relevant PRs is a small fraction of the 25 minutes. In particular, the time taken by this script is almost completely independent of the value of |
Note that calling via the HTTP API is seemingly a lot quicker, and gives you all of the information you want without having to load things in separately. Consider the following query: (click it!) The maximum value of You could then load as many of these pages as you want in Python via the requests module, parse the json into a dictionary, and then you've got all the information you want, including labels. I don't know what the 'official' way of interacting with the HTTP API is, but there surely is one, and I'm not sure what the limits are, but it seems like this could hopefully lead to a script that takes on the order of seconds or minutes rather than half an hour. |
The cache worked successfully for me 🙂 |
Thats my python script from the GAP Days which works with HTML:
The output is not corrected and improved like in Alexs script and so it looks like this at the moment:
And
I also had to make a small adjustment right now to get it working. But the script is pretty fast. |
This perhaps emphasises that it is a good goal to have the code for 'getting the data about merged PRs' and the code for 'parsing the data about merged PRs into release notes' be largely independent, which I think @ThomasBreuer might have suggested already. Given the fast (and stable) HTTP API, I don't think it will be necessary to maintain an HTML scraper, but since you've written it, it's good to have it – thanks for sharing @danielrademacher. |
Thanks for correction, @wilfwilson - you are right that |
Hi folks, amazing work you did there! :) I noticed that some sections of the generated markdown are empty, like
I think on first sight it's not clear whether these headings are hierarchical or not. Could we just add a |
I agree with @alex-konovalov. Let's get this merged in and further improve this in additional PRs. Wrt what @wilfwilson wrote:
I've used per_page=100 and checked the headers. Under When you arrive at the last page of the query, the |
I've (squashed and) merged - giving a nice base for future work. Let's try not to forget the comments/suggestions/discussions in this thread, though! 🙂 |
This is a looong discussion, though -- if people think there are points in here we should not forget, I recommend to put them into one or multiple issues (possibly with a link to here in case there's additional context to be found) |
See #4257 for the discussion.