Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add --noproxy argument to ignore proxy for hosts or IP addresses. #410

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

otaconix
Copy link
Contributor

@otaconix otaconix commented Mar 4, 2025

I saw #296 as a good first issue and thought I'd have a go at it.

A few things to note:

  • I named the argument the same as curl's (--noproxy), as that seems to be what you were aiming for. This could potentially be confusing for users, as it's similar to --no-proxy, which undoes any --proxy arguments.
  • reqwest's NoProxy's applies a wildcard (*) only to hostnames, not IP addresses. curl does both. So there's a little workaround to implicitly add IPv4 and IPv6 full ranges (respectively 0.0.0.0/0 and ::/0).
  • I wanted to add tests for IPv6 and hostnames, but didn't see a way to do this easily: IPv6 would require testing on a host that has an IPv6 interface (not sure lo has that by default), and for hostnames, I thought I could use --resolve, but that turns the hostname into an IP address, so the hostname never reaches reqwest. If needed, I can take a closer look to see if I can add these tests.

Copy link
Collaborator

@blyxxyz blyxxyz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

For the name, maybe --disable-proxy-for? Bit of a mouthful though.

/// - A '*' matches all hosts, and effectively disables proxies altogether.
/// - IP addresses are allowed, as are subnets (in CIDR notation, i.e.: '127.0.0.0/8').
/// - Any other entry in the list is assumed to be a hostname.
#[clap(long, value_name = "no-proxy-list", value_delimiter = ',')]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this trim spaces? I see google.com, 192.168.1.0/24 as an example in the reqwest docs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would appear that it does! See: https://docs.rs/reqwest/latest/src/reqwest/proxy.rs.html#474

I'll add a test to verify that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant whether clap's value_delimiter trims spaces, so our logic matches up with reqwest's

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, snap, yeah, great question! I'll definitely test it then! I was toying with the idea to keep the list in a String so I could offload the full parsing to reqwest. I might have to revisit that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a few tests that should demonstrate that:

  • whitespace in --disable-proxy-for is trimmed
  • any request to a host not specified in --disable-proxy-for still gets proxied

src/cli.rs Outdated
Comment on lines 298 to 302
/// Comma-separated list of hosts for which not to use a proxy, if one is specified.
///
/// - A '*' matches all hosts, and effectively disables proxies altogether.
/// - IP addresses are allowed, as are subnets (in CIDR notation, i.e.: '127.0.0.0/8').
/// - Any other entry in the list is assumed to be a hostname.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention the environment variable here like we do for --proxy?

src/cli.rs Outdated
Comment on lines 1107 to 1110
if noproxy.contains(&"*".to_string()) {
// reqwest's NoProxy wildcard doesn't apply to IP addresses, while curl's does
noproxy_comma_delimited.push_str(",0.0.0.0/0,::/0");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this creates a difference between the CLI flag and the environment variable. To be fully correct we may have to reimplement the env variable handling...

HTTPie (presumably via requests) also follows the curl behavior here.

I suspect this is a mistake in reqwest, since the original PR that added * describes it as "effectively disabling use of the proxy". Maybe we can file an issue and/or PR there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can file an issue and/or PR there?

That's an excellent idea. I had considered doing that, but got tempted (nerdsniped, perhaps?) to work around it, which, in retrospect, I shouldn't have. I hadn't even considered (or indeed known about? the environment variable reqwest reads.

Speaking of which, NoProxy::from_string's docs imply that it ignores its argument if the NO_PROXY/no_proxy environment variable is set, which doesn't actually seem to be the case from a cursory look at the code. Maybe I ought to mention that as well in the issue I'll open.

Thanks for pushing back here 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue filed here: seanmonstar/reqwest#2579

tests/cli.rs Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I think these tests would pass even if all --noproxy did was disable all proxies all the time. Can you add tests to check cases in which the pattern doesn't cover the URL and so the proxy does get used?

@otaconix
Copy link
Contributor Author

otaconix commented Mar 4, 2025

For the name, maybe --disable-proxy-for? Bit of a mouthful though.

Some of the ideas I had about the name:

  • --bypass-proxy
  • --ignore-proxy
  • curl's --noproxy

--disable-proxy-for is as good a suggestion as any, though as you say, a bit long.

Is there some "guideline"/design principle that's used to name flags? I originally added support for --proxy, but didn't give much thought to naming back then (I was just scratching my own itch).

@blyxxyz
Copy link
Collaborator

blyxxyz commented Mar 4, 2025

Most of the flags are inherited from HTTPie, so we don't have a lot of experience coming up with names unfortunately.

--noproxy feels too close to --no-proxy like you said. I don't have a strong opinion about bypass vs ignore vs disable, but I tacked on -for to make it clear that the option takes an argument. --disable-proxy sounds like it could be a binary flag that just toggles off all proxies. I'm not sure how confusing that is in practice.

Uhh, I'll use an LLM. I'll pretend these options already exist and ask it what they do and read what it hallucinates. That hopefully gets close to the intuition of real users. First I ask

What does xh's --proxy option do?

And it doesn't get the syntax quite right but it correctly guesses what the option is for.

Then I ask

Thanks! What about the --disable-proxy option?

And it gives examples where it doesn't take an argument, xh --disable-proxy https://example.com and xh --disable-proxy --auth user:password https://example.com. So it guessed incorrectly.

  • When I ask it about --disable-proxy-for it gets it exactly right, it even guesses that you can separate the hosts with commas: xh --proxy http://proxy.example.com:8080 --disable-proxy-for internal.example.com,api.local https://example.com.
  • For --noproxy it also guesses right, probably because it has been trained on lots of curl commands.
  • (For --no-proxy it guesses that it's a binary flag, so it picks up on the nuance! But it says that --no-proxy can override $http_proxy, which it can't, though maybe it should.)
  • For --ignore-proxy it incorrectly guesses a binary flag.
  • For --bypass-proxy it guesses correctly! I guess bypass makes it more clear than the other verbs that you're not completely disabling the proxies. This is a strong candidate.

I also asked it

Thanks! Is there an option for selectively bypassing the proxy for certain hosts?

And it came up with this, which is pretty interesting:
xh --proxy http://proxy.example.com:8080,no-proxy=localhost,no-proxy=127.0.0.1,no-proxy=example.com https://example.com

We have the https: and http: keys in the current syntax, so we could add a no-proxy:/noproxy:/no:/ignore: or something key. There's a pleasing pattern there, http: corresponds to $HTTP_PROXY, https: corresponds to $HTTPS_PROXY, so why not make no: correspond to $NO_PROXY? Then you could write --proxy 'http:http://localhost:8080,no:example.org'.

But maybe it's a little too cute, and I'd first want to check if the syntax for the --proxy option was invented for HTTPie or if it's also used somewhere else.

Finally I asked it

Thanks! I remember there's another option for selectively bypassing the proxy for certain hosts, what's it called?

and it hallucinated --proxy-ignore. Feels a little weird but it does more to suggest that it takes an argument than --ignore-proxy does.

Take all of this with a grain of salt of course.

@otaconix
Copy link
Contributor Author

otaconix commented Mar 5, 2025

@blyxxyz Can you take a look at the latest changes? And maybe also see if there's anything you'd like to add to the issue I raised at seanmonstar/reqwest#2579 ?

@blyxxyz blyxxyz self-requested a review March 5, 2025 19:17
@otaconix
Copy link
Contributor Author

otaconix commented Mar 5, 2025

I also asked it

Thanks! Is there an option for selectively bypassing the proxy for certain hosts?

And it came up with this, which is pretty interesting: xh --proxy http://proxy.example.com:8080,no-proxy=localhost,no-proxy=127.0.0.1,no-proxy=example.com https://example.com

We have the https: and http: keys in the current syntax, so we could add a no-proxy:/noproxy:/no:/ignore: or something key. There's a pleasing pattern there, http: corresponds to $HTTP_PROXY, https: corresponds to $HTTPS_PROXY, so why not make no: correspond to $NO_PROXY? Then you could write --proxy 'http:http://localhost:8080,no:example.org'.

But maybe it's a little too cute, and I'd first want to check if the syntax for the --proxy option was invented for HTTPie or if it's also used somewhere else.

This is getting into software archeology, but it kinda looks like httpie's syntax for proxies is a direct mapping of the underlying requests library: https://docs.python-requests.org/en/latest/user/advanced/#proxies

I do like the idea of adding something like no-proxy to the mini-DSL of the --proxy option, since --proxy is repeatable, and reqwest's NoProxy applies to a single proxy (it's just that as currently implemented in this PR, the --ignore-proxy-for is applied to every proxy). I'm just not sure how, if at all, xh will be able to map that to curl, since it doesn't look like curl supports anything like that out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants