Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement copy-paste detection #549

Merged
merged 7 commits into from
Jun 18, 2024
Merged

Implement copy-paste detection #549

merged 7 commits into from
Jun 18, 2024

Conversation

Luro02
Copy link
Collaborator

@Luro02 Luro02 commented Jun 8, 2024

This PR partially implements #431 and should be able to detect #265 and #442 (if the code segments are sufficiently large).

What is working?

  • Relatively fast detection of duplicate segments. On the test_submission A1 it takes around 70ms vs 98ms (comment language check) and Repeated Math Operation Check with 48ms. This could be even faster by further improving the StructuralHashCodeVisitor to have a minimum amount of hashCode collisions (at the cost of not detecting some duplicate codes). For example, by hashing all named elements, the time can be further reduced to only 12ms, but that is not feasible, because it should be able to detect duplicate code segments, where only the name of a variable is different.
  • Detecting duplicates ignoring names and comments

What is missing?

  • Adjusting the required number of statements for detecting a code duplicate based on where it is found (e.g. in an if-else) and how many differences there are (like a 1:1 copy vs one that requires multiple variables)
  • Code where the type is almost the same (the example where List/Set was used, but Collection would be necessary in a helper method)
  • Counting the number of required variables to a potential helper method (it should not lint code segments where a lot of parameters are required to refactor into a method)
  • Write a lot of tests and check for feasibility of creating a helper method.
  • (Suggest a helper method, difficult to implement, but might be worth it)
  • Add some safeguards to check that the StructuralEquality works well. (by comparing it to the slower version in debug mode)
  • Remove the CPDLinter Code
  • Evaluate when a differing expression can be replaced by a parameter (technically always by doing something like if (paramIsTrue) { doA(); } else { doB(); })

@Luro02 Luro02 marked this pull request as ready for review June 18, 2024 10:57
@Luro02 Luro02 merged commit 1f85b2d into Feuermagier:main Jun 18, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant