-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Match deequ support for spark 3.2.1 #93
Comments
Beside not being up to date with deeque version in the packages (pydeequ.deequ_maven_coord)), are there other problems? |
Hi @ghirardinicola thanks for reaching out :) Honestly we can’t tell, as we are still on Spark 3.1.2 in our framework (holding to decide if we cut deequ off of a light version of the framework or not) and avoiding rolling out the dq part of the framework globally because we unfortunately cannot fork the project internally at the moment to keep up with deequ (or contribute to the open source project, maybe one day we find capacity to do it). So in 3.1.2 there was for us the issue of in certain scenarios the Spark app wouldn’t finish automatically if we had pydeequ actions (but we manage to sort that out by manually closing the Spark context gateway (I think you have this on your issue list also, at least I remember seeing an issue). And on 3.2 we did not test yet but I believe there’s issues in your issue list reporting that some analyzers do not work. therefore it would give the pydeequ user base much more confidence in the project if there was a faster release cycle between Spark versions, deequ and pydeequ. But don’t get me wrong, we all understand that as an open source project the dev team is already kind enough to spend their time to work extra on the project. But I believe because this project and deequ are so cool that their roadmap is very important to potential heavy users like us for example. That’s why we are so interested and always asking for a new version :) but we fully understand that things take time, would be cool to know if there are still plans to keep updating the project or not, and that would help the community making the decision of forking the project, go the extra mile to find time to go through all the code and start contributing to the os project, or make other decision. Appreciate all your help and kindness to put this open to everyone! |
The
|
does pydeequ ==1.0.1 supports for spark 3.2 ? |
yes it also generally works with spark 3.3 and 3.1, but some components don't |
Thanks for the reply. Im trying to use column profiler and Is there any worker around ? |
There isn't, it is the same problem I had above. |
1.1.0 is released with Spark 3.2/3.3 support - I would close this for now |
Deequ now supports Spark 3.2.1. However, pydeequ still did not catch up to spark 3.1.x.
Goal: update pydeequ to support new deequ version and Spark 3.2.1
The text was updated successfully, but these errors were encountered: