Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change httpx to requests in file_task_handler #39799

Merged
merged 3 commits into from
Jun 15, 2024

Conversation

softyoungha
Copy link
Contributor

@softyoungha softyoungha commented May 24, 2024

closes: #39794


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link

boring-cyborg bot commented May 24, 2024

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide (https://github.com/apache/airflow/blob/main/contributing-docs/README.rst)
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

httpx does not support CIDRs in NO_PROXY

I wonder if it's possible to implement a test for this support since it's the main motivation for this change.

@Taragolis Taragolis changed the title Change httpx to requests in file_task_handler Change httpx to requests in file_task_handler May 24, 2024
@Taragolis
Copy link
Contributor

I guess the main prom that there is no standardisation for no_proxy and different libraries / os could work differently

Most of the libraries use stdlib implementation from Lib/urllib/request.py

Request use own implementation which supports CIDR blocks, but only for ipv4:

import os

os.environ["no_proxy"] = "localhost,127.0.0.1,10.0.0.0/8,172.16.1.*,1:2:3::/64"
os.environ["NO_PROXY"] = "localhost,127.0.0.1,192.168.1.0/16"

# stdlib implementation
from urllib.request import proxy_bypass_environment
# requests implementation
from requests.sessions import should_bypass_proxies

hosts = ("localhost", "127.0.0.1", "127.0.0.2", "10.0.0.0", "192.168.1.32", "172.16.1.2", "[1:2:3::4]")
for host in hosts:
    print(f" {host} ".center(80, '='))
    print(f"stdlib Proxy Bypass Environment: {proxy_bypass_environment(host, None)}")
    print(f"requests Proxy Bypass Environment: {should_bypass_proxies(f"http://{host}", None)}")
================================== localhost ===================================
stdlib Proxy Bypass Environment: True
requests Proxy Bypass Environment: True
================================== 127.0.0.1 ===================================
stdlib Proxy Bypass Environment: True
requests Proxy Bypass Environment: True
================================== 127.0.0.2 ===================================
stdlib Proxy Bypass Environment: False
requests Proxy Bypass Environment: False
=================================== 10.0.0.0 ===================================
stdlib Proxy Bypass Environment: False
requests Proxy Bypass Environment: True
================================= 192.168.1.32 =================================
stdlib Proxy Bypass Environment: False
requests Proxy Bypass Environment: False
================================== 172.16.1.2 ==================================
stdlib Proxy Bypass Environment: False
requests Proxy Bypass Environment: False
================================== [1:2:3::4] ==================================
stdlib Proxy Bypass Environment: False
requests Proxy Bypass Environment: False

I don't know what is supposed to be happen in case if we change library in the future, e.g. from requests to urllib3, or use aiohttp in case if we turn internal code to async implementation (asyncio / anyio / trio)

So probably it is fine for replace httpx by requests, and I'm pretty sure that in past this was called by requests before we temporary remove usage of requests due to licence issue: #15781. However there is should not be a commitment that this behaviour remaining the same in the future, because this part of the code usually can't be changed without monkey patching third-party library / stdlib

@softyoungha
Copy link
Contributor Author

softyoungha commented May 24, 2024

@hussein-awala @Taragolis
I've added test code, but since the code might change in the future like #39799 (comment), I'm dissatisfied with dependency on requests library in the test code.
Is it appropriate to set up a temporary proxy app fixture to run in background for writing test?
It seems a bit too much for test

@Taragolis
Copy link
Contributor

Is it appropriate to set up a temporary proxy app fixture to run in background for writing test?
It seems a bit too much for test

I would not spend a time for writing a test proxy. If there is any changes in the future then this test should be adopted by someone who will made this changes. About new library, it just a potential scenario but maybe it is never happen.

@Taragolis Taragolis added the type:improvement Changelog: Improvements label May 24, 2024
- httpx does not support CIDRs in NO_PROXY
- simply, convert httpx to requests, issues done
- related issue: apache#39794
@eladkal eladkal added this to the Airflow 2.9.3 milestone Jun 9, 2024
@eladkal eladkal added type:bug-fix Changelog: Bug Fixes and removed type:improvement Changelog: Improvements labels Jun 9, 2024
@potiuk
Copy link
Member

potiuk commented Jun 14, 2024

I believe, this is the last httpx use in Airflow core - httpx is used in a few other scripts and API client (and it could be easily replaced by requests as well), so we could get rid of httpx dependency in hatch_build.py.

@potiuk
Copy link
Member

potiuk commented Jun 15, 2024

I believe, this is the last httpx use in Airflow core - httpx is used in a few other scripts and API client (and it could be easily replaced by requests as well), so we could get rid of httpx dependency in hatch_build.py.

On the other hand with -> #40256 it turned out that we have some implicit dependencies on httpx in core, so removing it is not good idea.

@potiuk potiuk merged commit 1ddadf5 into apache:main Jun 15, 2024
50 checks passed
Copy link

boring-cyborg bot commented Jun 15, 2024

Awesome work, congrats on your first merged pull request! You are invited to check our Issue Tracker for additional contributions.

jannisko pushed a commit to jannisko/airflow that referenced this pull request Jun 15, 2024
* Change httpx to requests in file_task_handler

- httpx does not support CIDRs in NO_PROXY
- simply, convert httpx to requests, issues done
- related issue: apache#39794

* Add cidr no_proxy test test_log_handlers.py

* Apply monkeypatch fixture

---------

Co-authored-by: scott-py <scott.py@kakaocorp.com>
utkarsharma2 pushed a commit that referenced this pull request Jul 2, 2024
* Change httpx to requests in file_task_handler

- httpx does not support CIDRs in NO_PROXY
- simply, convert httpx to requests, issues done
- related issue: #39794

* Add cidr no_proxy test test_log_handlers.py

* Apply monkeypatch fixture

---------

Co-authored-by: scott-py <scott.py@kakaocorp.com>
(cherry picked from commit 1ddadf5)
romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Jul 26, 2024
* Change httpx to requests in file_task_handler

- httpx does not support CIDRs in NO_PROXY
- simply, convert httpx to requests, issues done
- related issue: apache#39794

* Add cidr no_proxy test test_log_handlers.py

* Apply monkeypatch fixture

---------

Co-authored-by: scott-py <scott.py@kakaocorp.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logging type:bug-fix Changelog: Bug Fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP_PROXY settings do not recognize Worker Pods within NO_PROXY CIDR range
6 participants