Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make DataCleaner simple - remove complexity of Spark, Scala and dynamic extensions #1968

Closed
kaspersorensen opened this issue Nov 14, 2024 · 0 comments

Comments

@kaspersorensen
Copy link
Member

DataCleaner is a complex tool. And as the lead developer on it for years, I'm sorry to say - I don't think it's maintainable in it's current state. I'd like to propose making DataCleaner maintainable by retaining what it is at it's core for 99% of its users, and ditching the complexity that is not really used anymore anyway. This is specifically related to making it easy to build and develop on DC. But also to make it easy to run in modern JVMs.

  • Remove the dynamic classloading / extensions / drivers and such. This has a huge technical complexity cost and makes the tool incompatible with newer JDKs.
  • Remove Spark engine - nobody uses DC for that sorta stuff by now
  • Remove the Scala components - too much build complexity for the value that it brings. This would mean getting rid of the "Visualizations" components though.

I'm going to make a branch for this. If nothing else for my own benefit of being able to build and run DC. But I think it should be considered the next major version of DC.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant