Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactored data clumps with the help of LLMs (research project) #960

Merged
merged 7 commits into from
Jun 17, 2024

Conversation

compf
Copy link
Contributor

@compf compf commented May 28, 2024

Hello maintainers,

I am conducting a master thesis project focused on enhancing code quality through automated refactoring of data clumps, assisted by Large Language Models (LLMs).

Data clump definition

A data clump exists if

  1. two methods (in the same or in different classes) have at least 3 common parameters and one of those methods does not override the other, or
  2. At least three fields in a class are common with the parameters of a method (in the same or in a different class), or
  3. Two different classes have at least three common fields

See also the following UML diagram as an example
Example data clump

I believe these refactoring can contribute to the project by reducing complexity and enhancing readability of your source code.

Pursuant to the EU AI Act, I fully disclose the use of LLMs in generating these refactorings, emphasizing that all changes have undergone human review for quality assurance.

Even if you decide not to integrate my changes to your codebase (which is perfectly fine), I ask you to fill out a feedback survey, which will be scientifically evaluated to determine the acceptance of AI-supported refactorings. You can find the feedback survey under https://campus.lamapoll.de/Data-clump-refactoring/en

Thank you for considering my contribution. I look forward to your feedback. If you have any other questions or comments, feel free to write a comment, or email me under tschoemaker@uni-osnabrueck.de .

Best regards,
Timo Schoemaker
Department of Computer Science
University of Osnabrück

@CLAassistant
Copy link

CLAassistant commented May 28, 2024

CLA assistant check
All committers have signed the CLA.

Copy link
Collaborator

@msridhar msridhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this interesting PR. I'm impressed with your tool that discovers this opportunity and is able to perform the transformation automatically. I don't think we can merge the PR as it is, as due to some issues that I believe hinder readability. I tried to capture examples in comments, but basically, I think the MethodAnalysisContext type should be generated differently, and I think any time a getter is invoked multiple times in the same method, it's better to store the value in a local variable. If you are interested in implementing these improvements, I would be open to doing another round of review.

Set<String> annotationContent =
NullabilityUtil.getAnnotationValueArray(methodSymbol, annotName, false);
NullabilityUtil.getAnnotationValueArray(
methodAnalysisContext.getMethodSymbol(), annotName, false);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getMethodSymbol() used multiple times in this method and should be stored in a local variable

if (isAnnotated && !isValid) {
return;
}
Symbol.MethodSymbol closestOverriddenMethod =
NullabilityUtil.getClosestOverriddenMethod(methodSymbol, state.getTypes());
NullabilityUtil.getClosestOverriddenMethod(
methodAnalysisContext.getMethodSymbol(), methodAnalysisContext.getState().getTypes());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly getState() is used multiple times

Comment on lines 156 to 167
methodAnalysisContext
.getState()
.reportMatch(
methodAnalysisContext
.getAnalysis()
.getErrorBuilder()
.createErrorDescription(
new ErrorMessage(ErrorMessage.MessageTypes.ANNOTATION_VALUE_INVALID, message),
tree,
methodAnalysisContext.getAnalysis().buildDescription(tree),
methodAnalysisContext.getState(),
null));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getState() and getAnalysis() used multiple times here

Copy link
Collaborator

@msridhar msridhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better! I have a couple more minor comments. After they are addressed I think this change will be good to go.

public void onMatchMethod(MethodTree tree, MethodAnalysisContext methodAnalysisContext) {

Symbol.MethodSymbol methodSymbol = methodAnalysisContext.methodSymbol();
VisitorState visitorState = methodAnalysisContext.state();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our convention is to name the VisitorState local variables / parameters state. Could we stick to that? That will actually significantly reduce the size of this PR and make it more readable.

Comment on lines 56 to 58
return Objects.equals(analysis, that.analysis)
&& Objects.equals(state, that.state)
&& Objects.equals(methodSymbol, that.methodSymbol);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can just do analysis.equals(that.analysis), etc. rather than using Objects.equals here as none of the fields in this class should ever be null (and we run NullAway on itself to check this)

@msridhar
Copy link
Collaborator

Also note that I merged in the latest main branch so best to pull before making further changes

Copy link

codecov bot commented Jun 14, 2024

Codecov Report

Attention: Patch coverage is 83.05085% with 10 lines in your changes missing coverage. Please review.

Project coverage is 85.88%. Comparing base (c4aed81) to head (bbedf7b).

Files Patch % Lines
.../uber/nullaway/handlers/MethodAnalysisContext.java 44.44% 10 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master     #960      +/-   ##
============================================
- Coverage     85.94%   85.88%   -0.07%     
- Complexity     2047     2051       +4     
============================================
  Files            81       82       +1     
  Lines          6765     6806      +41     
  Branches       1305     1312       +7     
============================================
+ Hits           5814     5845      +31     
- Misses          537      547      +10     
  Partials        414      414              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@msridhar msridhar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the contribution!

@msridhar msridhar enabled auto-merge (squash) June 17, 2024 17:50
@msridhar msridhar merged commit a4ce249 into uber:master Jun 17, 2024
9 of 11 checks passed
benkard pushed a commit to benkard/jgvariant that referenced this pull request Aug 8, 2024
This MR contains the following updates:

| Package | Type | Update | Change |
|---|---|---|---|
| [org.tukaani:xz](https://tukaani.org/xz/java.html) ([source](https://github.com/tukaani-project/xz-java)) | compile | minor | `1.9` -> `1.10` |
| [org.eclipse:yasson](https://projects.eclipse.org/projects/ee4j.yasson) ([source](https://github.com/eclipse-ee4j/yasson)) | compile | patch | `3.0.3` -> `3.0.4` |
| [org.eclipse.parsson:parsson](https://github.com/eclipse-ee4j/parsson) | compile | patch | `1.1.6` -> `1.1.7` |
| [com.uber.nullaway:nullaway](https://github.com/uber/NullAway) |  | patch | `0.11.0` -> `0.11.1` |
| [io.hosuaby:inject-resources-junit-jupiter](https://github.com/hosuaby/inject-resources) | test | patch | `0.3.3` -> `0.3.5` |
| [org.codehaus.mojo:exec-maven-plugin](https://www.mojohaus.org/exec-maven-plugin) ([source](https://github.com/mojohaus/exec-maven-plugin)) | build | minor | `3.3.0` -> `3.4.0` |

---

### Release Notes

<details>
<summary>tukaani-project/xz-java</summary>

### [`v1.10`](https://github.com/tukaani-project/xz-java/releases/tag/v1.10): XZ for Java 1.10

[Compare Source](tukaani-project/xz-java@v1.9...v1.10)

</details>

<details>
<summary>eclipse-ee4j/yasson</summary>

### [`v3.0.4`](eclipse-ee4j/yasson@3.0.3...3.0.4)

[Compare Source](eclipse-ee4j/yasson@3.0.3...3.0.4)

</details>

<details>
<summary>eclipse-ee4j/parsson</summary>

### [`v1.1.7`](eclipse-ee4j/parsson@1.1.6...1.1.7)

[Compare Source](eclipse-ee4j/parsson@1.1.6...1.1.7)

</details>

<details>
<summary>uber/NullAway</summary>

### [`v0.11.1`](https://github.com/uber/NullAway/blob/HEAD/CHANGELOG.md#Version-0111)

[Compare Source](uber/NullAway@v0.11.0...v0.11.1)

-   Fix issue 1008 ([#&#8203;1009](uber/NullAway#1009))
-   JSpecify: read upper bound annotations from bytecode and add tests ([#&#8203;1004](uber/NullAway#1004))
-   Fix crash with suggested suppressions in JSpecify mode ([#&#8203;1001](uber/NullAway#1001))
-   Update to JSpecify 1.0 and use JSpecify annotations in NullAway code ([#&#8203;1000](uber/NullAway#1000))
-   Expose [@&#8203;EnsuresNonNull](https://github.com/EnsuresNonNull) and [@&#8203;RequiresNonNull](https://github.com/RequiresNonNull) in annotations package ([#&#8203;999](uber/NullAway#999))
-   Don't report initializer warnings on [@&#8203;NullUnmarked](https://github.com/NullUnmarked) constructors / methods ([#&#8203;997](uber/NullAway#997))
-   Strip annotations from MethodSymbol strings ([#&#8203;993](uber/NullAway#993))
-   JSpecify: fix crashes where declared parameter / return types were raw ([#&#8203;989](uber/NullAway#989))
-   JSpecify: Handle [@&#8203;nullable](https://github.com/nullable) elements for enhanced-for-loops on arrays ([#&#8203;986](uber/NullAway#986))
-   Features/944 tidy stream nullability propagator ([#&#8203;985](uber/NullAway#985))
-   Tests for loops over arrays ([#&#8203;982](uber/NullAway#982))
-   Bug fixes for array subtyping at returns / parameter passing ([#&#8203;980](uber/NullAway#980))
-   JSpecify: Handle [@&#8203;nonnull](https://github.com/nonnull) elements in [@&#8203;nullable](https://github.com/nullable) content arrays ([#&#8203;963](uber/NullAway#963))
-   Don't report [@&#8203;nullable](https://github.com/nullable) type argument errors for unmarked classes ([#&#8203;958](uber/NullAway#958))
-   External Library Models: Adding support for Nullable upper bounds of Generic Type parameters ([#&#8203;949](uber/NullAway#949))
-   Refactoring / code cleanups:
    -   Test on JDK 22 ([#&#8203;992](uber/NullAway#992))
    -   Add test case for [@&#8203;nullable](https://github.com/nullable) Void with override in JSpecify mode ([#&#8203;990](uber/NullAway#990))
    -   Enable UnnecessaryFinal and PreferredInterfaceType EP checks ([#&#8203;991](uber/NullAway#991))
    -   Add missing [@&#8203;test](https://github.com/test) annotation ([#&#8203;988](uber/NullAway#988))
    -   Fix typo in variable name ([#&#8203;987](uber/NullAway#987))
    -   Remove AbstractConfig class ([#&#8203;974](uber/NullAway#974))
    -   Fix Javadoc for MethodRef ([#&#8203;973](uber/NullAway#973))
    -   Refactored data clumps with the help of LLMs (research project) ([#&#8203;960](uber/NullAway#960))
-   Build / CI tooling maintenance:
    -   Various cleanups enabled by bumping minimum Java and Error Prone versions ([#&#8203;962](uber/NullAway#962))
    -   Disable publishing of snapshot builds from CI ([#&#8203;967](uber/NullAway#967))
    -   Update Gradle action usage in CI workflow ([#&#8203;969](uber/NullAway#969))
    -   Update Gradle config to always compile Java code using JDK 17 ([#&#8203;971](uber/NullAway#971))
    -   Update JavaParser to 3.26.0 ([#&#8203;970](uber/NullAway#970))
    -   Reenable JMH benchmarking in a safer manner ([#&#8203;975](uber/NullAway#975))
    -   Updated JMH Benchmark Comment Action ([#&#8203;976](uber/NullAway#976))
    -   Update to Gradle 8.8 ([#&#8203;981](uber/NullAway#981))
    -   Update to Error Prone 2.28.0 ([#&#8203;984](uber/NullAway#984))
    -   Update to Gradle 8.9 ([#&#8203;998](uber/NullAway#998))
    -   Update to WALA 1.6.6 ([#&#8203;1003](uber/NullAway#1003))

</details>

<details>
<summary>hosuaby/inject-resources</summary>

### [`v0.3.5`](hosuaby/inject-resources@v0.3.4...v0.3.5)

[Compare Source](hosuaby/inject-resources@v0.3.4...v0.3.5)

### [`v0.3.4`](hosuaby/inject-resources@v0.3.3...v0.3.4)

[Compare Source](hosuaby/inject-resources@v0.3.3...v0.3.4)

</details>

<details>
<summary>mojohaus/exec-maven-plugin</summary>

### [`v3.4.0`](https://github.com/mojohaus/exec-maven-plugin/releases/tag/3.4.0)

[Compare Source](mojohaus/exec-maven-plugin@3.3.0...3.4.0)

<!-- Optional: add a release summary here -->

#### 🚀 New features and improvements

-   Allow `<includePluginDependencies>` to be specified for the exec:exec goal ([#&#8203;432](mojohaus/exec-maven-plugin#432)) [@&#8203;sebthom](https://github.com/sebthom)

#### 🐛 Bug Fixes

-   Do not get UPPERCASE env vars ([#&#8203;427](mojohaus/exec-maven-plugin#427)) [@&#8203;wheezil](https://github.com/wheezil)

#### 📦 Dependency updates

-   Bump org.codehaus.mojo:mojo-parent from 82 to 84 ([#&#8203;434](mojohaus/exec-maven-plugin#434)) [@&#8203;dependabot](https://github.com/dependabot)
-   Bump org.codehaus.plexus:plexus-xml from 3.0.0 to 3.0.1 ([#&#8203;431](mojohaus/exec-maven-plugin#431)) [@&#8203;dependabot](https://github.com/dependabot)

#### 👻 Maintenance

-   Remove Log4j 1.2.x from ITs ([#&#8203;437](mojohaus/exec-maven-plugin#437)) [@&#8203;slawekjaranowski](https://github.com/slawekjaranowski)

#### 🔧 Build

-   Use Maven 3.9.7 and 4.0.0-beta-3 ([#&#8203;433](mojohaus/exec-maven-plugin#433)) [@&#8203;slawekjaranowski](https://github.com/slawekjaranowski)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever MR is behind base branch, or you tick the rebase/retry checkbox.

👻 **Immortal**: This MR will be recreated if closed unmerged. Get [config help](https://github.com/renovatebot/renovate/discussions) if that's undesired.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNC4yNC4wIiwidXBkYXRlZEluVmVyIjoiMzQuMjQuMCJ9-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants