-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Universal-pathlib 0.2.4 breaks Airlfow ObjectStoragePath object for local filesystem paths #276
Comments
Thanks for reporting @potiuk, we should be able to fix this by moving the check to the S3, GCS, and Azure subclasses. This will change in the future. It would still be great if we could add a third_party test to the |
Yeah I think currently it is a bit frigile, but not sure if we can come up with comprehensive tests in the universal path repo (@bolkedebruin? ) . Maybe we can start adding a 'how upath is used' kind of tests ? Maybe a good solution for now could be to have some information when a beta/rc of update is published and we could run our tests against i in Airflow ? That might be more efficient. BTW. I've been discussing an interesting long term idea with Alpha Omega - a fund that I am working with on improving security of Airflow Supply Chain - we are kicking-off the project on Tuesday at a keynote at Airflow Summit - and accidentally our methodology picked 16 project to pilot our work with and I am going to reach out to you @ap-- very quickly after the kick-off. .. So apart of improving security we discussed an idea of CI of CIs - where rather than having to add such integration tests into dependencies we could build in a set of cross-dependent project that would run tests automatically when their dependencies are preparing for a new release. I think it is quite doable and we might as well see if we could do something like that with universal path and few others quicker than we thought |
That would be easy to do. I could trigger a workflow which would allow installing pre-release wheels via
Sounds good |
It also adds support for 3.13, so there is now some time to work on a better release procedure / mechanism. |
Universal Pathlib 0.2.4 adds extra validation for Paths and our integration with local file paths Does not work with it. The 0.2.5 version fixes it. Tracked in fsspec/universal_pathlib#276
Thank you! Changed apache/airflow#42090 to Yeah Python 3.13 is coming - I hope we will be able to move faster with this one - and Airlfow 3 will already by Python 3.13 support - but it very much depends on a lot of our dependencies, usually it takes 4-6 months befor enough of them catch up so that we can say "most of Airlfow + Providers will work for 3.13". |
And yes - confirmed 0.2.5 fixes the issue! |
Thanks for being so responsive. |
Thanks @ap-- ;-). Happy to work something out with you as soon as I have a bit more bandwidth. |
) Universal Pathlib 0.2.4 adds extra validation for Paths and our integration with local file paths Does not work with it. The 0.2.5 version fixes it. Tracked in fsspec/universal_pathlib#276
…che#42090) Universal Pathlib 0.2.4 adds extra validation for Paths and our integration with local file paths Does not work with it. The 0.2.5 version fixes it. Tracked in fsspec/universal_pathlib#276 (cherry picked from commit 2517da4)
) (#42101) Universal Pathlib 0.2.4 adds extra validation for Paths and our integration with local file paths Does not work with it. The 0.2.5 version fixes it. Tracked in fsspec/universal_pathlib#276 (cherry picked from commit 2517da4)
) (#42101) Universal Pathlib 0.2.4 adds extra validation for Paths and our integration with local file paths Does not work with it. The 0.2.5 version fixes it. Tracked in fsspec/universal_pathlib#276 (cherry picked from commit 2517da4)
…090) (#42101) Universal Pathlib 0.2.4 adds extra validation for Paths and our integration with local file paths Does not work with it. The 0.2.5 version fixes it. Tracked in fsspec/universal_pathlib#276 (cherry picked from commit 2517da47694f021ae43a07deb7d2e33d456baef4) GitOrigin-RevId: 9de0e59b8a97d2a38818b01e3d4659d5683c177c
The universal_pathlib==0.2.4 raises an early import error about missing container/bucket when "file:///tmp" local file path object is created for Airflow ObjectStoragePath which derives from CloudPath.
So far, we have been using ObjectStoragePath in Airflow for all the different kinds of Airflow Providers - Cloud storage, local storage and others. And we have chosen (@bolkedebruin - @ap-- maybe that has been discussed before?), the ObjectStoragePath of ours derives from CloudPath:
https://github.com/apache/airflow/blob/e7c2d545119c917d31c70481f5380798bb705c1e/airflow/io/path.py#L80
However as of universal-pathlib 0.2.4 (specifically #264) there is an early error raised during import when the path passed to the ObjectStoragePath (and further down to CloudPath) is not of a bucket/container format.
While I understand the motivation behind the change, this means that our example dag that uses "file:///tmp/TEMP_PATH" starts failing during import.
Example here: https://github.com/apache/airflow/actions/runs/10758063762/job/29832946513#step:7:5018
I am not sure about earlier design decisions there - but it seems a bit disconnect vs. how the "CloudPath" validation and expectation that it should only handle Cloud Paths (apparently) with the usage in Airlfow where Object Storage Path might also be a local filesysystem.
Not sure what is the right solution there - possibly ObjectStoragePath of ours shoudl derive actually from UPath and not CloudPath ? However I do not know what other consequeneces it might have - because it seems that for S3/GCS and others, using CloudPath is more appropriate (and I guess a number of things will not work when we do not derive from it).
@bolkedebruin @ap-- I think your deep knowledge of both might be needed to solve it.
For now I will limit universal-pathlib to 0.2.3 with link to that issue, because otherwise it will break some of the usages (i.e. those we recommend users in the example dags).
The text was updated successfully, but these errors were encountered: