Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added automatic size labeler #37262

Merged
merged 11 commits into from
Apr 27, 2024
Merged

Conversation

amanmoon
Copy link
Contributor

@amanmoon amanmoon commented Feb 8, 2024

I have implemented the automatic size labeler, which now assigns labels to pull requests based on the number of lines changed

Minimal
Typically involves very small changes, bug fixes, or updates that require only a few lines of code, often less than 50.
#37208 #37146 #37043

Small
Involves more substantial changes than minimal, potentially adding new features or making modifications to existing ones. The range is usually between 50 to 100 lines of code.
#37152 #37132

Moderate
Represents a significant portion of the codebase being modified, such as adding new features, refactoring, or making extensive changes to existing functionalities. This might involve between 100 to 300 lines of code.
#36919 #37112

Large
Involves substantial and complex changes across various parts of the codebase. This could include major architectural changes, the introduction of new modules, or a significant overhaul of existing features, often exceeding 300 lines of code.
#37125 #36977 #36972

📝 Checklist

  • The title is concise, informative, and self-explanatory.
  • The description explains in detail what this PR is about.
  • I have linked a relevant issue or discussion.

Fixes: #37254

@amanmoon amanmoon marked this pull request as draft February 8, 2024 23:10
@amanmoon amanmoon marked this pull request as ready for review February 9, 2024 12:29
@amanmoon amanmoon force-pushed the size_labeler/checker branch 2 times, most recently from 1eeac5f to 56602ef Compare February 9, 2024 13:26
@mkoeppe
Copy link
Contributor

mkoeppe commented Feb 9, 2024

If the changes exceed 200 lines (insertion + deletions), the label will be set to huge; otherwise, it will be labeled as tiny.

That's a pretty dramatic transition from "tiny" to "huge" in 1 line...
This may need a bit more thought?

@amanmoon
Copy link
Contributor Author

amanmoon commented Feb 9, 2024

I was thinking something like
Minimal - less than 50 lines of changes
Small - between 50 to 100 lines of changes
Moderate - between 100 to 300 lines
Large - more than 300 lines of changes?

@mkoeppe
Copy link
Contributor

mkoeppe commented Feb 9, 2024

It may be useful to illustrate such a proposal by putting examples of PRs of such sizes in the description, to facilitate discussion.

Also consider whether labels are necessary for the sizes in the middle. Labels for the smallest category and for the largest category have clear functions (which?)

@mkoeppe mkoeppe requested a review from soehms February 24, 2024 22:55
@soehms
Copy link
Member

soehms commented Feb 28, 2024

First some questions:

  1. What was the reason for not using the GH actions from the marketplace suggested at CI: Add pull request size labeler / checker #37254?
  2. Shouldn't we group the new labels with a prefix, say v: (for volume), as we did for example for status and priority labels (s: , p: )?
  3. Should it be allowed to add or remove the new labels manually?
  4. We already have a bot to sync groups of labels that have a common meaning. It comes as a Python script .github/sync_labels which is called from .github/workspaces/sync_labes.yml. Wouldn't it make sense to integrate the new functionality there? Then automatic label manipulation would be concentrated in one place and we wouldn't waste resources by launching GitHub workflows twice.
  5. Did you notice, the issue with the GitHub web-interface described in this sage-devel post?

@amanmoon
Copy link
Contributor Author

amanmoon commented Mar 1, 2024

What was the reason for not using the GH actions from the marketplace suggested at #37254?

Not using the GitHub Action will give us flexibility. I was thinking using regex, we can skip blank lines and also exclude changes in documentation from counting towards the final number of lines changed.

Should it be allowed to add or remove the new labels manually?

I don't know how can we do this, hitting Github label's endpoint won't return who added those labels.
We can define another set of labels which can only be added manually and if these are detected it will stop the Github label bot.
There might be some better solution for this.

Shouldn't we group the new labels with a prefix, say v: (for volume), as we did for example for status and priority labels (s: , p: )?
We already have a bot to sync groups of labels that have a common meaning. It comes as a Python script .github/sync_labels which is called from .github/workspaces/sync_labes.yml. Wouldn't it make sense to integrate the new functionality there? Then automatic label manipulation would be concentrated in one place and we wouldn't waste resources by launching GitHub workflows twice.

I'll address these issues on my next commit.

Did you notice, the issue with the GitHub web-interface described in this sage-devel post?

This worked fine, I tested the code on one of my repo.

@soehms
Copy link
Member

soehms commented Mar 1, 2024

What was the reason for not using the GH actions from the marketplace suggested at #37254?

Not using the GitHub Action will give us flexibility. I was thinking using regex, we can skip blank lines and also exclude changes in documentation from counting towards the final number of lines changed.

Yes, flexibility is a good point, but I'm not sure we should exclude changes in documentation.

Should it be allowed to add or remove the new labels manually?

I don't know how can we do this, hitting Github label's endpoint won't return who added those labels. We can define another set of labels which can only be added manually and if these are detected it will stop the Github label bot. There might be some better solution for this.

The sync_labels bot posts a warning comment that is deleted after a few minutes if the user changes a status or priority label in a non-compliant manner. Maybe we can do something similar here.

Shouldn't we group the new labels with a prefix, say v: (for volume), as we did for example for status and priority labels (s: , p: )?
We already have a bot to sync groups of labels that have a common meaning. It comes as a Python script .github/sync_labels which is called from .github/workspaces/sync_labes.yml. Wouldn't it make sense to integrate the new functionality there? Then automatic label manipulation would be concentrated in one place and we wouldn't waste resources by launching GitHub workflows twice.

I'll address these changes on my next commit.

Great!

Did you notice, the issue with the GitHub web-interface described in this sage-devel post?

This worked fine, I tested the code on one of my repo.

I think it would be less critical here than with state labels, but we should keep attention on this.

FYI: I won't reply for a week as I'm on vacation.

@amanmoon amanmoon force-pushed the size_labeler/checker branch from 728ca5b to 28d9088 Compare March 3, 2024 23:17
Copy link
Member

@soehms soehms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry! Maybe I gave a bad hint. I expected that the logic could be included in the Python file, but I overlooked the fact that sync_labels doesn't checkout the entire repository (for performance reasons), which is what is needed here. Therefore, I'm no longer sure whether it makes sense to include the code in sync_labels.yml. I hope that reverting to the previous version won't cause you too much trouble.

However, I tested your code a bit in my fork soehms#13. So far things are looking good to me. But maybe you should consider making the constants (SMALL_THRESHOLD ...) changeable via repository variables.

@amanmoon
Copy link
Contributor Author

amanmoon commented Mar 12, 2024

Sure!
can we integrate #37373 here?

@amanmoon amanmoon force-pushed the size_labeler/checker branch 4 times, most recently from 7739cd7 to 9448ab9 Compare March 12, 2024 06:57
@amanmoon amanmoon force-pushed the size_labeler/checker branch from 9448ab9 to da0903d Compare March 12, 2024 06:58
@soehms
Copy link
Member

soehms commented Mar 12, 2024

can we integrate #37373 here?

Sounds reasonable! If you have the resources to do it, please do it!

I wouldn't let the job fail if the repository variables are not set. It's better to just skip it. Otherwise, after the PR is merged but the variables are not set, every open PR would have at least one failed job after each push.

Some notes on code style and structure:

  1. I think you should use environment variables for repeatedly used literals (like v: minimal...).
  2. You should consider implementing more complicated logic in a Python script (or at least a shell script) called in the yml. This would make testing much easier.

@soehms
Copy link
Member

soehms commented Mar 12, 2024

I wouldn't let the job fail if the repository variables are not set.

However, even if they are set the job fails (see this run in soehms#14).

vbraun pushed a commit to vbraun/sage that referenced this pull request Apr 18, 2024
    
I have implemented the `automatic size labeler`, which now assigns
labels to pull requests based on the number of lines changed

**Minimal**
Typically involves very small changes, bug fixes, or updates that
require only a few lines of code, often less than 50.
sagemath#37208 sagemath#37146 sagemath#37043

**Small**
Involves more substantial changes than minimal, potentially adding new
features or making modifications to existing ones. The range is usually
between 50 to 100 lines of code.
sagemath#37152 sagemath#37132

**Moderate**
Represents a significant portion of the codebase being modified, such as
adding new features, refactoring, or making extensive changes to
existing functionalities. This might involve between 100 to 300 lines of
code.
sagemath#36919 sagemath#37112

**Large**
Involves substantial and complex changes across various parts of the
codebase. This could include major architectural changes, the
introduction of new modules, or a significant overhaul of existing
features, often exceeding 300 lines of code.
sagemath#37125 sagemath#36977 sagemath#36972

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.

Fixes: sagemath#37254
    
URL: sagemath#37262
Reported by: Aman Moon
Reviewer(s): Sebastian Oehms
vbraun pushed a commit to vbraun/sage that referenced this pull request Apr 20, 2024
    
I have implemented the `automatic size labeler`, which now assigns
labels to pull requests based on the number of lines changed

**Minimal**
Typically involves very small changes, bug fixes, or updates that
require only a few lines of code, often less than 50.
sagemath#37208 sagemath#37146 sagemath#37043

**Small**
Involves more substantial changes than minimal, potentially adding new
features or making modifications to existing ones. The range is usually
between 50 to 100 lines of code.
sagemath#37152 sagemath#37132

**Moderate**
Represents a significant portion of the codebase being modified, such as
adding new features, refactoring, or making extensive changes to
existing functionalities. This might involve between 100 to 300 lines of
code.
sagemath#36919 sagemath#37112

**Large**
Involves substantial and complex changes across various parts of the
codebase. This could include major architectural changes, the
introduction of new modules, or a significant overhaul of existing
features, often exceeding 300 lines of code.
sagemath#37125 sagemath#36977 sagemath#36972

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->
<!-- If your change requires a documentation PR, please link it
appropriately -->
<!-- If you're unsure about any of these, don't hesitate to ask. We're
here to help! -->
<!-- Feel free to remove irrelevant items. -->

- [x] The title is concise, informative, and self-explanatory.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.

Fixes: sagemath#37254
    
URL: sagemath#37262
Reported by: Aman Moon
Reviewer(s): Sebastian Oehms
@vbraun vbraun merged commit c315128 into sagemath:develop Apr 27, 2024
11 checks passed
@mkoeppe mkoeppe added this to the sage-10.4 milestone Apr 27, 2024
@amanmoon
Copy link
Contributor Author

amanmoon commented Apr 28, 2024

@soehms, The Action variables are not set, leading the action to be skipped.

@amanmoon amanmoon deleted the size_labeler/checker branch April 29, 2024 16:37
@soehms
Copy link
Member

soehms commented May 2, 2024

@soehms, The Action variables are not set, leading the action to be skipped.

This must be done by a maintainer of the repository. @mkoeppe, can you enable the feature by setting the repository variables SMALL_THRESHOLD, MODERATE_THRESHOLD and LARGE_THRESHOLD? Do we need to announce this on sage-devel?

@mkoeppe
Copy link
Contributor

mkoeppe commented May 6, 2024

I've added these variables with values 50, 100, 300, as per ticket description (to my understanding)

@mkoeppe
Copy link
Contributor

mkoeppe commented May 6, 2024

Do we need to announce this on sage-devel?

I think that would be a great idea.

Also it would be good to explain it in https://github.com/sagemath/sage/wiki/Sage-10.4-Release-Tour#automatic-pr-size-labeler -- I've only created this section as a stub

@soehms
Copy link
Member

soehms commented May 7, 2024

I've added these variables with values 50, 100, 300, as per ticket description (to my understanding)

Thanks!

@soehms
Copy link
Member

soehms commented May 7, 2024

Do we need to announce this on sage-devel?

I think that would be a great idea.

See https://groups.google.com/g/sage-devel/c/w4IeYgXgVUc.

Also it would be good to explain it in https://github.com/sagemath/sage/wiki/Sage-10.4-Release-Tour#automatic-pr-size-labeler -- I've only created this section as a stub

I filled in a sentence there.

@soehms
Copy link
Member

soehms commented May 7, 2024

@amanmoon You should have a look at this workflow run.

@amanmoon
Copy link
Contributor Author

amanmoon commented May 7, 2024

@amanmoon You should have a look at [this workflow run]
(https://github.com/sagemath/sage/actions/runs/8979068478/job/24660492110#step:3:21).

If the author does not have the latest develop branch (with this auto labeler PR merged) merged with his PR, the file not found error will occur.
After everyone is on the latest commit this won't cause any issues. We can add a test to check this for now and remove the test after some time, as the test will be irrelevant when everyone is on the latest commit.

The moderate threshold label is not been assigned to PR even if the changes are between 100 and 300. we can see this here.
Just to be sure, I tested the code again here.

@soehms
Copy link
Member

soehms commented May 7, 2024

@amanmoon You should have a look at [this workflow run]
(https://github.com/sagemath/sage/actions/runs/8979068478/job/24660492110#step:3:21).

If the author does not have the latest develop branch (with this auto labeler PR merged) merged with his PR, the file not found error will occur. After everyone is on the latest commit this won't cause any issues.

I see!

We can add a test to check this for now and remove the test after some time, as the test will be irrelevant when everyone is on the latest commit.

I don't think this is necessary!

The moderate threshold label is not been assigned to PR even if the changes are between 100 and 300. we can see this here. Just to be sure, I tested the code again here.

This looks like a line feed occurs at the end of the MODERATE_THRESHOLD value. @mkoeppe can you check this value?

@mkoeppe
Copy link
Contributor

mkoeppe commented May 7, 2024

This looks like a line feed occurs at the end of the MODERATE_THRESHOLD value. @mkoeppe can you check this value?

Indeed. I've removed it.

@mkoeppe
Copy link
Contributor

mkoeppe commented May 7, 2024

If the author does not have the latest develop branch (with this auto labeler PR merged) merged with his PR, the file not found error will occur.
After everyone is on the latest commit this won't cause any issues. We can add a test to check this for now and remove the test after some time, as the test will be irrelevant when everyone is on the latest commit.

I think it would be better if the script was taken from a separate checkout of the develop branch, not the PR branch:
As the script runs with elevated privileges to make the changes to PRs, the workflow should not execute arbitrary code from a PR.

Also, recently I have switched to putting such helper scripts in the .ci/ directory instead of the .github/workflows/ directory, which I think makes the distinction clearer (.githhub/workflows/ is referred to by actions references, whereas the helper scripts always come from some git checkout).

@soehms
Copy link
Member

soehms commented May 9, 2024

I think it would be better if the script was taken from a separate checkout of the develop branch, not the PR branch: As the script runs with elevated privileges to make the changes to PRs, the workflow should not execute arbitrary code from a PR.

It is also possible to checkout single files (see for example here).

Also, recently I have switched to putting such helper scripts in the .ci/ directory instead of the .github/workflows/ directory, which I think makes the distinction clearer (.githhub/workflows/ is referred to by actions references, whereas the helper scripts always come from some git checkout).

I agree! There are some more to move:

$ LIST='(extract-sage-local.sh|scan-logs.sh|conda-lock-update.py|sync_labels.py|set_labels_by_changes.sh)'
$ egrep $LIST .github/workflows/*.yml
.github/workflows/ci-macos.yml:          .github/workflows/scan-logs.sh "artifacts/$LOGS_ARTIFACT_NAME"
.github/workflows/docker.yml:          .github/workflows/scan-logs.sh "artifacts/$LOGS_ARTIFACT_NAME"
.github/workflows/macos.yml:          .github/workflows/extract-sage-local.sh sage-local-artifact/sage-local-*.tar
.github/workflows/macos.yml:          .github/workflows/scan-logs.sh "artifacts/$LOGS_ARTIFACT_NAME"
.github/workflows/pr-labeler.yml:          chmod a+x .github/workflows/set_labels_by_changes.sh
.github/workflows/pr-labeler.yml:          .github/workflows/set_labels_by_changes.sh
.github/workflows/sync_labels.yml:          files: .github/sync_labels.py
.github/workflows/sync_labels.yml:          chmod a+x .github/sync_labels.py
.github/workflows/sync_labels.yml:          .github/sync_labels.py $ACTION $ISSUE_URL $PR_URL $ACTOR "$LABEL" "$REV_STATE" $LOG_LEVEL
.github/workflows/sync_labels.yml:          chmod a+x .github/sync_labels.py
.github/workflows/sync_labels.yml:          .github/sync_labels.py $REPO_URL $LOG_LEVEL

@amanmoon, can you open a corresponding follow-up PR?

@mkoeppe
Copy link
Contributor

mkoeppe commented May 9, 2024

Note I'm taking care of some of the scripts in:

@soehms
Copy link
Member

soehms commented May 10, 2024

Note that @tscrim started a discussion on the benefit of the labels in the sage-devel-thread.

@mkoeppe
Copy link
Contributor

mkoeppe commented May 28, 2024

There's a complaint that the size is sometimes wrong: https://groups.google.com/g/sage-devel/c/w4IeYgXgVUc/m/Og7NP-3VAQAJ

@amanmoon amanmoon mentioned this pull request May 30, 2024
3 tasks
vbraun pushed a commit to vbraun/sage that referenced this pull request Jun 3, 2024
    
Moved the file set_labels_by_changes.sh to to the .ci directory.
The workflow runs from the main repo.
Fixed bug, the correct labels will be added even if the feature branch
is behind.

sagemath#37262
sagemath#37262 (comment)
https://groups.google.com/g/sage-devel/c/w4IeYgXgVUc/m/Og7NP-3VAQAJ
<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#38114
Reported by: Aman Moon
Reviewer(s): Matthias Köppe
vbraun pushed a commit to vbraun/sage that referenced this pull request Jun 4, 2024
    
Moved the file set_labels_by_changes.sh to to the .ci directory.
The workflow runs from the main repo.
Fixed bug, the correct labels will be added even if the feature branch
is behind.

sagemath#37262
sagemath#37262 (comment)
https://groups.google.com/g/sage-devel/c/w4IeYgXgVUc/m/Og7NP-3VAQAJ
<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#38114
Reported by: Aman Moon
Reviewer(s): Matthias Köppe
vbraun pushed a commit to vbraun/sage that referenced this pull request Jun 5, 2024
    
Moved the file set_labels_by_changes.sh to to the .ci directory.
The workflow runs from the main repo.
Fixed bug, the correct labels will be added even if the feature branch
is behind.

sagemath#37262
sagemath#37262 (comment)
https://groups.google.com/g/sage-devel/c/w4IeYgXgVUc/m/Og7NP-3VAQAJ
<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#38114
Reported by: Aman Moon
Reviewer(s): Matthias Köppe
vbraun pushed a commit to vbraun/sage that referenced this pull request Jun 8, 2024
    
Moved the file set_labels_by_changes.sh to to the .ci directory.
The workflow runs from the main repo.
Fixed bug, the correct labels will be added even if the feature branch
is behind.

sagemath#37262
sagemath#37262 (comment)
https://groups.google.com/g/sage-devel/c/w4IeYgXgVUc/m/Og7NP-3VAQAJ
<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
    
URL: sagemath#38114
Reported by: Aman Moon
Reviewer(s): Matthias Köppe
@soehms
Copy link
Member

soehms commented Jun 17, 2024

It may be useful to illustrate such a proposal by putting examples of PRs of such sizes in the description, to facilitate discussion.

Also consider whether labels are necessary for the sizes in the middle. Labels for the smallest category and for the largest category have clear functions (which?)

Sorry, I lost track of this hint. At least because of that @vdelecroix is right in speaking of a wrong guidance in this sage-devel post. Another point is that I did not expect this PR to become controversial because I thought the naming of the labels made it clear that they were just quantitative indicators that could help with preselection in the GitHub search function. Now, I agree with @tscrim that we should have involved sage-devel much earlier.

@tscrim tscrim mentioned this pull request Jul 4, 2024
5 tasks
vbraun pushed a commit to vbraun/sage that referenced this pull request Jul 20, 2024
    
<!-- ^ Please provide a concise and informative title. -->
<!-- ^ Don't put issue numbers in the title, do this in the PR
description below. -->
<!-- ^ For example, instead of "Fixes sagemath#12345" use "Introduce new method
to calculate 1 + 2". -->
<!-- v Describe your changes below in detail. -->
<!-- v Why is this change required? What problem does it solve? -->
<!-- v If this PR resolves an open issue, please link to it here. For
example, "Fixes sagemath#12345". -->

We revert sagemath#37262. This was voted on sage-devel:
https://groups.google.com/g/sage-devel/c/3PLZD-4UFIA

### 📝 Checklist

<!-- Put an `x` in all the boxes that apply. -->

- [x] The title is concise and informative.
- [x] The description explains in detail what this PR is about.
- [x] I have linked a relevant issue or discussion.
- [x] I have created tests covering the changes.
- [x] I have updated the documentation and checked the documentation
preview.

### ⌛ Dependencies

<!-- List all open PRs that this PR logically depends on. For example,
-->
<!-- - sagemath#12345: short description why this is a dependency -->
<!-- - sagemath#34567: ... -->
    
URL: sagemath#38334
Reported by: Travis Scrimshaw
Reviewer(s): Sebastian Oehms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CI: Add pull request size labeler / checker
5 participants