Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove snakebite-py3 based HDFS hooks and sensors #31262

Merged
merged 1 commit into from
May 15, 2023

Conversation

potiuk
Copy link
Member

@potiuk potiuk commented May 12, 2023

Those Hooks were using technology that is not going to work in a new environment, with newer versions of Python supported, protobufs that reached end of life.

We should finally get rid of them. Users can still use older versions of providers with those old HDFS hooks and sensors if needed.


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

I think enough is enough.

We need to get rid of this monster with a clean cut once and forever. Without any deprecations etc.

@potiuk potiuk force-pushed the get-rid-of-hdfs branch 2 times, most recently from 7242a22 to 8167b02 Compare May 12, 2023 19:59
@potiuk
Copy link
Member Author

potiuk commented May 12, 2023

I have also added exception raised when old hdfs modules are used - it explains that the users shoudl either convert to webhdfs or downgrade provider

@potiuk
Copy link
Member Author

potiuk commented May 13, 2023

I updated it slightly (following failures in CI) - the classes remain there (excluded from the docs) and any attempt of using them will raise appropriate error (instructing the users to either downgrade or switch to WebHDFS).

I even went as far as explaining how they can downgrade to later hdfs provider supporting old hdfs:

If you want to use earlier provider you can downgrade to latest released 3.* version
using pip install apache-airflow-providers-hdfs==3.2.1 (no constraints)

I think this is very "fair" approach. HDFS has never been part of the image, so users had to always explicitly choose hdfs when installing airflow. Shall we include that in this wave of providers?

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, it seems reasonable, especially considering that there are improved alternatives for each sensor/hook that was removed. However, personally, I have not come across a similar suppression without deprecation in the past, so I would rather wait for @eladkal's approval before merging the PR.

docs/apache-airflow-providers-apache-hdfs/connections.rst Outdated Show resolved Hide resolved
Those Hooks were using technology that is not going to work in
a new environment, with newer versions of Python supported,
protobufs that reached end of life.

We should finally get rid of them. Users can still use older
versions of providers with those old HDFS hooks and sensors if
needed.
@potiuk
Copy link
Member Author

potiuk commented May 15, 2023

For me, it seems reasonable, especially considering that there are improved alternatives for each sensor/hook that was removed. However, personally, I have not come across a similar suppression without deprecation in the past, so I would rather wait for @eladkal's approval before merging the PR.

I would also love to hear from @eladkal. I think we have no clear deprecation policy for those kind of changes (how long should we have deprecation for ? Should it be released in some airflow constraints?

I think though it should be really case-by-case. I think we might be a little "less" protective here. Especially if we have good reason for it. But if we think some deprecation strategy is needed here, I am happy to get it instead (and release that one only as a follow-up).

@eladkal
Copy link
Contributor

eladkal commented May 15, 2023

I think removing with major release is the right approach here.
Deprecation first isn't useful and there is no action item needed from the users. The issue is that the package is not maintained thus we can not support it any longer

@potiuk potiuk merged commit 9a8c204 into apache:main May 15, 2023
1 check passed
@potiuk potiuk deleted the get-rid-of-hdfs branch May 15, 2023 09:35
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core-operators Operators, Sensors and hooks within Core Airflow area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants