Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Universal-pathlib 0.2.4 breaks Airlfow ObjectStoragePath object for local filesystem paths #276

Closed
potiuk opened this issue Sep 8, 2024 · 8 comments
Labels
bug 🐛 Something isn't working

Comments

@potiuk
Copy link

potiuk commented Sep 8, 2024

The universal_pathlib==0.2.4 raises an early import error about missing container/bucket when "file:///tmp" local file path object is created for Airflow ObjectStoragePath which derives from CloudPath.

So far, we have been using ObjectStoragePath in Airflow for all the different kinds of Airflow Providers - Cloud storage, local storage and others. And we have chosen (@bolkedebruin - @ap-- maybe that has been discussed before?), the ObjectStoragePath of ours derives from CloudPath:

https://github.com/apache/airflow/blob/e7c2d545119c917d31c70481f5380798bb705c1e/airflow/io/path.py#L80

However as of universal-pathlib 0.2.4 (specifically #264) there is an early error raised during import when the path passed to the ObjectStoragePath (and further down to CloudPath) is not of a bucket/container format.

While I understand the motivation behind the change, this means that our example dag that uses "file:///tmp/TEMP_PATH" starts failing during import.

Example here: https://github.com/apache/airflow/actions/runs/10758063762/job/29832946513#step:7:5018

Traceback (most recent call last):
  File "/opt/airflow/airflow/models/dagbag.py", line 352, in parse
    loader.exec_module(new_module)
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/opt/airflow/tests/system/providers/common/io/example_file_transfer_local_to_s3.py", line 35, in <module>
    TEMP_FILE_PATH = ObjectStoragePath("file:///tmp")
  File "/usr/local/lib/python3.9/site-packages/upath/implementations/cloud.py", line 30, in __init__
    raise ValueError("non key-like path provided (bucket/container missing)")
ValueError: non key-like path provided (bucket/container missing)

I am not sure about earlier design decisions there - but it seems a bit disconnect vs. how the "CloudPath" validation and expectation that it should only handle Cloud Paths (apparently) with the usage in Airlfow where Object Storage Path might also be a local filesysystem.

Not sure what is the right solution there - possibly ObjectStoragePath of ours shoudl derive actually from UPath and not CloudPath ? However I do not know what other consequeneces it might have - because it seems that for S3/GCS and others, using CloudPath is more appropriate (and I guess a number of things will not work when we do not derive from it).

@bolkedebruin @ap-- I think your deep knowledge of both might be needed to solve it.

For now I will limit universal-pathlib to 0.2.3 with link to that issue, because otherwise it will break some of the usages (i.e. those we recommend users in the example dags).

@potiuk potiuk changed the title Universal-pathlib 0.2. Universal-pathlib 0.2.4 breaks Airlfow ObjectStoragePath object for local filesystem paths Sep 8, 2024
@ap--
Copy link
Collaborator

ap-- commented Sep 8, 2024

Thanks for reporting @potiuk,

we should be able to fix this by moving the check to the S3, GCS, and Azure subclasses.
The main reason for a lot of the issues with ObjectStoragePath is that subclassing UPath to extend the UPath API is still not easy to do if you require support for multiple protocols.

This will change in the future.

It would still be great if we could add a third_party test to the universal_pathlib test-suite that verifies ObjectStoragePath functionality. That would allow me to refactor without breaking airflow downstream with every patch release.

@potiuk
Copy link
Author

potiuk commented Sep 8, 2024

Yeah I think currently it is a bit frigile, but not sure if we can come up with comprehensive tests in the universal path repo (@bolkedebruin? ) . Maybe we can start adding a 'how upath is used' kind of tests ?

Maybe a good solution for now could be to have some information when a beta/rc of update is published and we could run our tests against i in Airflow ? That might be more efficient.

BTW. I've been discussing an interesting long term idea with Alpha Omega - a fund that I am working with on improving security of Airflow Supply Chain - we are kicking-off the project on Tuesday at a keynote at Airflow Summit - and accidentally our methodology picked 16 project to pilot our work with and I am going to reach out to you @ap-- very quickly after the kick-off. .. So apart of improving security we discussed an idea of CI of CIs - where rather than having to add such integration tests into dependencies we could build in a set of cross-dependent project that would run tests automatically when their dependencies are preparing for a new release.

I think it is quite doable and we might as well see if we could do something like that with universal path and few others quicker than we thought

@ap--
Copy link
Collaborator

ap-- commented Sep 8, 2024

Maybe a good solution for now could be to have some information when a beta/rc of update is published and we could run our tests against i in Airflow?

That would be easy to do. I could trigger a workflow which would allow installing pre-release wheels via repository_dispatch in airflow once I push a pre-release version to pypi.

and I am going to reach out to you @ap-- very quickly after the kick-off

Sounds good ☺️

@ap--
Copy link
Collaborator

ap-- commented Sep 8, 2024

v0.2.5 should fix the issues in airflow.

It also adds support for 3.13, so there is now some time to work on a better release procedure / mechanism.

@ap-- ap-- added the bug 🐛 Something isn't working label Sep 8, 2024
@ap-- ap-- closed this as completed Sep 8, 2024
potiuk added a commit to potiuk/airflow that referenced this issue Sep 9, 2024
Universal Pathlib 0.2.4 adds extra validation for Paths and our
integration with local file paths Does not work with it. The
0.2.5 version fixes it.

Tracked in fsspec/universal_pathlib#276
@potiuk
Copy link
Author

potiuk commented Sep 9, 2024

Thank you! Changed apache/airflow#42090 to!=0.2.4 then :).

Yeah Python 3.13 is coming - I hope we will be able to move faster with this one - and Airlfow 3 will already by Python 3.13 support - but it very much depends on a lot of our dependencies, usually it takes 4-6 months befor enough of them catch up so that we can say "most of Airlfow + Providers will work for 3.13".

@potiuk
Copy link
Author

potiuk commented Sep 9, 2024

And yes - confirmed 0.2.5 fixes the issue!

@potiuk
Copy link
Author

potiuk commented Sep 9, 2024

Thanks for being so responsive.

@bolkedebruin
Copy link

Thanks @ap-- ;-). Happy to work something out with you as soon as I have a bit more bandwidth.

potiuk added a commit to apache/airflow that referenced this issue Sep 9, 2024
)

Universal Pathlib 0.2.4 adds extra validation for Paths and our
integration with local file paths Does not work with it. The
0.2.5 version fixes it.

Tracked in fsspec/universal_pathlib#276
potiuk added a commit to potiuk/airflow that referenced this issue Sep 9, 2024
…che#42090)

Universal Pathlib 0.2.4 adds extra validation for Paths and our
integration with local file paths Does not work with it. The
0.2.5 version fixes it.

Tracked in fsspec/universal_pathlib#276

(cherry picked from commit 2517da4)
potiuk added a commit to apache/airflow that referenced this issue Sep 11, 2024
) (#42101)

Universal Pathlib 0.2.4 adds extra validation for Paths and our
integration with local file paths Does not work with it. The
0.2.5 version fixes it.

Tracked in fsspec/universal_pathlib#276

(cherry picked from commit 2517da4)
ephraimbuddy pushed a commit to apache/airflow that referenced this issue Sep 13, 2024
) (#42101)

Universal Pathlib 0.2.4 adds extra validation for Paths and our
integration with local file paths Does not work with it. The
0.2.5 version fixes it.

Tracked in fsspec/universal_pathlib#276

(cherry picked from commit 2517da4)
kosteev pushed a commit to GoogleCloudPlatform/composer-airflow that referenced this issue Nov 9, 2024
…090) (#42101)

Universal Pathlib 0.2.4 adds extra validation for Paths and our
integration with local file paths Does not work with it. The
0.2.5 version fixes it.

Tracked in fsspec/universal_pathlib#276

(cherry picked from commit 2517da47694f021ae43a07deb7d2e33d456baef4)

GitOrigin-RevId: 9de0e59b8a97d2a38818b01e3d4659d5683c177c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants