Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to make dataset objects totally unhashable #42054

Merged
merged 8 commits into from
Sep 6, 2024

Conversation

uranusjr
Copy link
Member

@uranusjr uranusjr commented Sep 6, 2024

Using the hash property on Dataset (and DatasetAlias) is problematic since subclassing is a possibility, and user code may not correctly implement hashing for Airflow’s purposes.

We should consider extra...important for tests.
Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this change, but not sure whether it should be categorized as a breaking change

@uranusjr
Copy link
Member Author

uranusjr commented Sep 6, 2024

This should generally not affect too many usages, but there’s no harm adding a note I guess. We can’t tell how those are used…

@uranusjr uranusjr marked this pull request as ready for review September 6, 2024 07:16
@uranusjr uranusjr merged commit 1c4a00b into apache:main Sep 6, 2024
51 checks passed
@uranusjr uranusjr deleted the dataset-unhashable branch September 6, 2024 07:16
potiuk added a commit to potiuk/airflow that referenced this pull request Sep 6, 2024
@potiuk
Copy link
Member

potiuk commented Sep 6, 2024

cc: @uranusjr @Lee-W -> we revert it for now as it causes some tests involving open-lineage (in amazon and common.io providers) to fail - see #42059 discussion. Apparently there is an implicit dependency betweeen datasets implementation and amazon/common.io tests

Please re-do it and add "full tests needed" label for this change to make sure all the tests are passing.

@kacpermuda
Copy link
Contributor

I think the OpenLineage is not involved here, but the hook level lineage collection is causing different providers to create datasets, so there is a dependency there.

topherinternational pushed a commit to topherinternational/airflow that referenced this pull request Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants