Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pylint alerts corrections as part of intervention experiment #419

Closed
evidencebp opened this issue Oct 8, 2024 · 17 comments
Closed

Pylint alerts corrections as part of intervention experiment #419

evidencebp opened this issue Oct 8, 2024 · 17 comments

Comments

@evidencebp
Copy link
Contributor

I'd like to conduct a software engineering experiment regarding the benefit of Pylint alerts removal.
The experiment is described here.
In the experiments, Pylint is used with some specific alerts, files are selected for intervention and control.
After the interventions are done, one can wait and examine the results.

I'm asking your approval for conducting the interventions in your repositories.The interventions are expected to benefit the project and will submitted in PR for approval.

See examples of interventions in stanford-oval/storm, gabfl/vault, and coreruleset/coreruleset.

See the planed internetions.

Is it OK if I'll fix the alerts?

@xiki-tempula
Copy link
Collaborator

xiki-tempula commented Oct 8, 2024

I did have a thought of linting but wouldn't ruff be a better call? pylint is not so popular these days.
image

@evidencebp
Copy link
Contributor Author

I was not familiar with ruff, thank you for letting me know @xiki-tempula , I'll check it.
However, this experiment is on pylint.

Regarding the use of linters, they might help to improve the code significantly.
However, they tend to produce a lot of alerts, making fixing all of them too time consuming.
The goal of this experiment is to build a dataset of fixes and to evaluate their benefit.

The experiment can allow you to see the fixes on your code and their value (and I'll be happy to know what you think of them).
However, note that for the experiment we consider only files that were recently modified and split the files into intervention and control.
This means that there will probably be other alerts that won't be fixed.

So, can I do the interventions?

@xiki-tempula
Copy link
Collaborator

xiki-tempula commented Oct 9, 2024

I kind of prefer to do it with ruff. I'm happy if you want to do it with ruff but I don't think anyone uses pylint, so I don't want to do something that makes pylint happy but makes other more common linting tools unhappy.

@evidencebp
Copy link
Contributor Author

I understand your concern.
Linters are based on software engineering insights and alerts of ones are usually not considered good by other linters.

As you can see in the planed internetions, there are 10 alerts.
There types are: line-too-long, too-many-branches, unnecessary-pass, pointless-statement, and too-many-statements.
All these alerts are rather canonical.
Since ruff has 800 alerts, I'm rather sure it alerts on all these and more.

Also, the benefit of the project is a main concern.
When I notice a false alert, I do not modify the code and the alert is labeled as less beneficial.

@xiki-tempula
Copy link
Collaborator

I think it would be good to do all the linting fix in one go. But I think you could do a PR and I could see if we can merge it.

@evidencebp
Copy link
Contributor Author

Great.

@evidencebp
Copy link
Contributor Author

@xiki-tempula, please see the PR with the alert fixes.
I did not find requirements files so I could not run the code.
Can you consult me on that?

I would like to also consult you regarding two alerts, which I did not fix yet.

In src\alchemlyb\workflows\base.py, the method plot is empty with just pass.
I was not sure if it was forgotten (and goog alert) or overridden in sub-classs
.If it should be overridden it might be better to raise NotImplemented exception to enforce implementation in all sub-classes

In src\alchemlyb\preprocessing\subsampling.py, there is an alert in line 189 on 
df[key]
whose value is not used.
It seems that statement is needed in order to see if the key exists.
However, won't you prefer to check directly with key in df.columns?

@evidencebp
Copy link
Contributor Author

Oh, I see that some test fail due to my changes.
Sorry, I'll check that.

@evidencebp
Copy link
Contributor Author

There were two faliures which I fixed.

The first was in src\alchemlyb\visualisation\dF_state.py
The value of mnb is not set in the case of landscape orientation.
It was like that even before extracting get_determine_orientation.
However, now that this is a separated function, the test fails due to returning an unset variable.

This might detect a bigger bug regarding not setting it unintentionally.

The other was in src\alchemlyb\workflows\abfe.py which pass now.

I also get slightly less coverage

This is due to untested exception raising code.
It was not tested before too yet now in the extraced functions it is considered as a new line (while it is actually a moved uncovered line).

@evidencebp
Copy link
Contributor Author

There is an additional faliure at windows-latest, 3.12
It fails at Run mamba-org/setup-micromamba@main

I could not trace it and I don't recall touching related code.

@xiki-tempula , can you consult on that too?

@evidencebp
Copy link
Contributor Author

I noted that the CI failed on the same problem regardless of my PR.

@evidencebp
Copy link
Contributor Author

@xiki-tempula , are you satisfied with the PR?
Are there things I should modify or clarify?

@xiki-tempula
Copy link
Collaborator

I'm happy with the fix which fixes line too long.
But I'm concerned with regard to putting part of the function into another function.
The separation doesn't seems logic to me and kind of reduces the readability.

@evidencebp
Copy link
Contributor Author

This is a good start already.

Regarding the spliting, I tried to chooce code parts that are

  • Already in the same sequence (not trying to reuse/merge duplicated code)
  • Have a resonable logic that I can identify (and use as the function name)
  • When possible, having comments supporting the above

In general, it is recommended to split to code into small units having a single goal.
The readbility is considered to be increased since you read the functionality and not the implementation details unless they are relevant.

Are there cases in which you are ok with the splitting?

@evidencebp
Copy link
Contributor Author

@xiki-tempula , would you like me to create anew PR, without the splitting, so you will be OK with merging it?

@xiki-tempula
Copy link
Collaborator

without the splitting, please? Thanks.

@evidencebp
Copy link
Contributor Author

Great, I created a dedicated PR
@xiki-tempula

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants