Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Defining /http #63

Closed
victorb opened this issue Apr 18, 2018 · 27 comments
Closed

Defining /http #63

victorb opened this issue Apr 18, 2018 · 27 comments

Comments

@victorb
Copy link
Member

victorb commented Apr 18, 2018

Currently, libraries are implementing /http while it hasn't been defined 1 2. We should catch up with the implementations and define it, then implement the behaviour.

As far as I know:

  • Can be be wrapped within /tls to support https.
  • Is a path protocol (as /unix), meaning no further multiaddrs can follow specifying /http/a/b/c as we don't know when it ends. This is a general problem for path protocols (currently just /unix)
  • Unsure of how to deal with complex strings that contains ?, / and other characters that might confuse multiaddr. Encode them maybe?

Examples:

  • https://1.2.3.4:5001/api => /ip4/1.2.3.4/tcp/5001/tls/http/api
  • http://1.2.3.4/api => /ip4/1.2.3.4/tcp/80/http/api
  • http://1.2.3.4 => /ip4/1.2.3.4/tcp/80/http
  • http://1.2.3.4/api/ => /ip4/1.2.3.4/tcp/80/http/api/ (keeping trailing slash)

Issues mentioning /http and /tls/http (https)

@ghost
Copy link

ghost commented Apr 18, 2018

The open questions to me are:

  • Encapsulation in a path protocol
  • Host header
  • Query string, fragment, authentication
  • Proxying

Unsure of how to deal with complex strings that contains ?, / and other characters that might confuse multiaddr. Encode them maybe?

Question mark is fine, slash could be also fine if libraries stopped to just split on every slash. Instead, string multiaddrs should probably be parsed step by step, with every protocol reading as much of the remaining bytes as it thinks is neccessary.

Examples:

These look good to me for a start

Encapsulation in a path protocol

Is a path protocol (as /unix), meaning no further multiaddrs can follow specifying /http/a/b/c as we don't know when it ends. This is a general problem for path protocols (currently just /unix)

Yep -- we'll still want to anticipate a way to encapsulate stuff within a path protocol though, e.g. a websocket at specific http path or /ipfs over a unix socket. In the binary form it's fine (everything is length-prefixed), but in the string/human-readable form it's tricky since we rely on forward-slashes to delimit protocols and values. There's several options:

  1. Length-prefix the path, e.g. /http/9some/path (bad, humans would need to count characters)
  2. Escape slashes in the path, e.g. /http/some\/path (okay, easier for machines to parse (split-on-slash))
  3. Put a delimiter around paths, e.g. /http/$/some/path$ or /http/(/some/path) (okay, easier for humans to read and write)

We should split this particular question into a separate issue.

Host header

There is one more thing to consider which is the Host header. All other http headers I can think of don't matter for addressing the endpoint since they only deal with representations and encodings, but the Host header is actually crucial for addressing.

The most elegant option seems to be /http/example.com/some/path, but that way we'd have to make the host part mandatory because we can't safely distinguish a hostname from a simple path element (domain regexpes are far from enough).

Query string, fragment, authentication

Two more things that matter for addressing are query strings and fragments. I have a hunch they can just be represented as they are in a URL, e.g. /http/some/path?key=value#fragment.

I think authentication (user[:pass]@host) should not be part of the address.

Proxying

This can likely be as simple as a /socks5 multiaddr encapsulating the /http multiaddr.

@vasco-santos
Copy link
Member

Hey, I think that we should definitely advance with this topic.

The initial examples provided by @victorbjelkholm are good, and we can add other examples that we intend to support, in order to find a uniform solution for all of them.

Regarding @lgierth notes, I agree in every aspect, except the Encapsulation in a path protocol. Imho, think that adding delimiters to a path will be really odd for someone using multiaddr, and it could be error prone. This way, I belive that we should create a specific issue for this discussion, as you suggested. I will think in other options meanwhile.

@ghost
Copy link

ghost commented Apr 19, 2018

adding delimiters to a path will be really odd for someone using multiaddr, and it could be error prone

That's fine, it can be a separate multiaddr protocol, so people can use /http without special syntax in almost all cases, and in the special case of encapsulating something within /http, they can use the special protocol, e.g. /httpx/(/path)/ws. That's why it'd be easy to take care of it later.

@vasco-santos
Copy link
Member

I see... Adding a special protocol for this special case seems fine for me then!

@bobheadxi
Copy link

Has there been any progress with this?

@ntninja
Copy link
Contributor

ntninja commented Jan 27, 2019

Bikeshed + more detailed spec proposal:
I'd think using curly braces would more aesthetically pleasing than parenthesis: /http/{/path}/ws
There should also be less cases where this conflicts with something within the path, then with parenths, but in case it does… Can we standardize the backslash for escaping stuff in paths in general? The following should then be equivalent:

  1. /http/{/path\}}
  2. /http/path\}

…and also this:

  1. /http/{/host\{/path}
  2. /http/host\{/path

…while these should be also invalid (ie: not equivalent to any of the above!):

  • /http/host{/path
  • /http/path}

As mentioned above the extra braces around HTTP's value should always be allowed, but only required if there is some other protocol wrapped within. Such as:

  • /http/{host\{/path}/wss

The canonical representation should always be the one with the least number of braces – reducing the number of bytes required and improving readability. Libraries should always emit the canonical representation and must accept both forms. All of this of course also applies for Unix sockets and whatever else we find in terms of path-like protocols.

Sounds good? If you disagree on by bikeshed's color please also reply so that we can sort this out quickly. 🙂

@Stebalien
Copy link
Member

I have two small concerns with that:

  1. Being able to concatenate both string and binary multiaddrs is really nice (and we use this everywhere). If {} is optional, this no longer applies to string multiaddrs (/http/path + /ws -> /http/path/ws instead of /http/{path}/ws).
  2. /http/{}/foobar looks really weird.

So, what if we just made it mandatory? That is, /http/foobar has no path, /http/{/path/to/thing} has one?

@Stebalien
Copy link
Member

@vasco-santos you objected to the encapsulation. Mind taking a look at my comment?

@Stebalien
Copy link
Member

@Alexander255 please paint this bikeshed.

@ghost
Copy link

ghost commented Feb 13, 2019

Have a bunch of notes on /http, putting a TODO for myself here

@ntninja
Copy link
Contributor

ntninja commented Feb 13, 2019

@Stebalien: I have an alternate proposal for solving this at #87:

Since, as you correctly pointed out, the path is optional for /http anyways and we're, on the other hand, omitting some potentially useful attributes, I've suggested a key-value syntax instead. The TL;DR basically is this:

  • /http(host=example.com,base=/api/v1)
  • /http(base=/endpoint\(1:2\))

More examples:

  • /tls(sni=example.com)
  • /ip6(scope=6)/fe00::32/tcp/80/http
  • /wss(host=example.com:4443,base=/api/v1,user=john,password=doh,cookie=bla=blab)/tls/ws
    • Note: The name host here refers to the HTTP Host-Header and has nothing to do where to connection will actually be made to.
  • /wss/tls/ws

(It also also comes an exact description of the proposed text syntax that is not reproduced here.)

This would also allow for your proposed case of painless concatination (/http/path/ws), since /http(base=/path) + /wss = /http(base=/path)/wss. (I must point out however that stacking WebSockets on top of HTTP like this does not make any sense.)

Let me know what you think.

@Stebalien
Copy link
Member

I've taken a look, written a comment, and closed my browser. I'll try to comment on that issue ASAP. Making multiaddr solid is going to be really important.

@ghost
Copy link

ghost commented Apr 1, 2019

Putting the issues of additional key-value params (#87) and encapsulation-in-paths aside, my instict is we should have /http addresses stick as closely to URLs as we can:

/http/example.net/file.txt
/http/_/file.txt
/http/user:pass@example.net/file.txt?k=v
  • In almost all cases, you will want a Host header, so it seems reasonable to make its presence the default case and its absence the special case (_). The same probably applies to /tls and SNI as well.
  • It seems useful to somewhat match URL syntax from multiple points of view:
    • The "UX" of /http addresses will be familiar
    • Less complexity
    • In most cases you'll build a URL from the multiaddr anyway, to pass it to your HTTP client library of choice.

I don't neccessarily see a need for expressing cookies and other data in /http multiaddrs. I think it's even arguable whether user:pass@ is still a good idea nowadays, but then again, matching URLs is good.

To me it seems we'd only want what's neccessary to route the connection to the correct remote endpoint. That includes e.g. fingerprints of remote public keys to authenticate the other end, but I'm not convinced it should include authorization (session cookies).

@Stebalien
Copy link
Member

What about unix domain sockets? We can probably just use _ in that case but it still feels funky. We could also just require http 1.1 (and therefore require the host header).

TLS doesn't necessarily require domains. What if we make the "sni" argument configurable? For example: /tls/subject:google.com;hash:Qm.../ (encoded as a cbor object in the binary version). _ would just mean "implied subject" and "any" would mean "allow anything, just speak tls". We could even use this to specify the version requirements (unless we want to make that a part of the protocol itself.

I agree cookies are a separate issue. Multiaddrs shouldn't touch bearer tokens.

@ghost
Copy link

ghost commented Apr 1, 2019

What about unix domain sockets? We can probably just use _ in that case but it still feels funky.

What about them? There's no hostname there, just /unix + /path/to/sock. If you mean http-over-socket, you'd encapsulate /http/... within /unix/... and use the to-be-determined parentheses-or-whatever syntax when representing the result as a string.

TLS doesn't necessarily require domains. What if we make the "sni" argument configurable?

I have a hunch we'll have an easier time going with separate protocols for variations of TLS, rather than parameterizing one protocol -- e.g. something like /tls/example.com for the HTTPS case, /tls+pk/<multihash> for authenticating the remote end, etc. If it turns out the flexible parameterized approach is neccessary, there can still be a /tls+x protocol for it.

You'll notice I'm trying to avoid parameters :) But if we really have to, we can do it.

@Stebalien
Copy link
Member

If you mean http-over-socket, you'd encapsulate /http/... within /unix/... and use the to-be-determined parentheses-or-whatever syntax when representing the result as a string.

My point is more that domain names don't really make sense in that context.

I have a hunch we'll have an easier time going with separate protocols for variations of TLS, rather than parameterizing one protocol

It would be nice to support /dns/xyz/tls/subject:xyz;ca:<multihash>. But we could that with multiple protocols: /dns/xyz/tls/xyz/ca/<multihash> (inconsistent with /ip6zone but, IMO, that was a mistake on my part).


Actually, this brings up the issue of libp2p support. For example, we now have the /quic protocol with no arguments. However, the QUIC protocol has mandatory "tls" support. In libp2p, we end up with addresses that look like /ip4/.../tcp/.../quic/p2p/Qm... and then we let the QUIC transport check the peer ID. You'll notice the lack of an SNI parameter. Our current libp2p over QUIC handshake spec (libp2p/specs#151) actually reserves the SNI field for future version negotiation (https://github.com/libp2p/specs/pull/151/files#r271020919).

@ntninja
Copy link
Contributor

ntninja commented Apr 5, 2019

@lgierth: So parameters are off the table? For what it's worth I've updated the proposal to include a relatively compact binary representation for parameters, but maybe a more customized parameter-type-specific approach may be worth the hassle to get rid of some extra bytes.

I have a hunch we'll have an easier time going with separate protocols for variations of TLS, rather than parameterizing one protocol -- e.g. something like /tls/example.com for the HTTPS case, /tls+pk/ for authenticating the remote end, etc. If it turns out the flexible parameterized approach is neccessary, there can still be a /tls+x protocol for it.

How do you express “send SNI X and expect fingerprint Y in response” with this? Add an extra /tls+pk+sni/example.com:<multihash> for that? The combinatorics may get really ugly with that approach – and you'd still end with requiring the hostname to be encoded several times in the basic case of /dns/example.com/tcp/433/tls/http.

Just saying – obviously we disagree on this. 😉

@Stebalien
Copy link
Member

Stebalien commented Aug 15, 2019

Motivation for revisiting: ipfs/kubo#6560

Summary so far:

  • /http currently takes no arguments.
  • There are existing users of the http protocol taking no arguments (p2p-webrtc-direct).
  • We currently have both http and https but should probably have /tls/http.
  • Technically, /ws should be /http/ws...
  • HTTP 1.1 requires a Host header.
  • Some protocols (but surprisingly few) require specifying a path.
  • The only way we can currently encode a path inside of a multiaddr is to make the protocol with the path argument terminal. That is, given /dns/foo.com/http/foo/bar/quic/..., /foo/bar/quic/... would all be treated as part of the HTTP path.

Questions:

  • Eventually move from /https (and /wss) to /tls/http (and /tls/ws)?
  • Eventually move from /ws to /http/ws?
  • Can we punt on the path part? The DNS config work doesn't need it, it just needs a host.
    • Is it sufficient to infer the host? Where necessary, we could specify paths of the form /dns/foobar.com/http. We could even allow /ip4/xyz/tcp/.../dns/foobar.com/http.
    • Should we add a host protocol? This would allow us to explicitly specify the host without that DNS hack.
  • Can we break p2p-webrtc-direct (unlikely) or somehow migrate without breaking anything.
    • If so, should we simply require that the path starts with a (possibly redundant) host?
  • Should we define a new protocol that takes a path?
  • How do we handle the fact that paths are currently terminal?

@tomaka
Copy link
Member

tomaka commented Aug 15, 2019

Technically, /ws should be /http/ws...

We had a small discussion about that within Parity.
The WebSockets RFC mentions that it's a separate protocol from HTTP: https://tools.ietf.org/html/rfc6455#section-1.7

@Stebalien
Copy link
Member

Thanks for making that simpler!

We're going to have a meeting on zoom (https://protocol.zoom.us/my/stebalien) at 15:00 UTC Thursday (today?) in case you'd like to join (or @Alexander255). The goal is to unblock ipfs/kubo#6560 without hating ourselves in the future so we're going to avoid trying to solve the entire problem all at once (just avoid hating ourselves in the future).

@lidel
Copy link
Member

lidel commented Aug 15, 2019

Dumping my notes from meeting:

  • conversion from https:// URL to miltiaddr is lossy right now, we drop path and basic auth credentials
  • identified protocol-agnostic need:
  • open question: how to represent protocol-specific parameters?
    • use /protocol(key=value) notation from Proposal: Add keyword arguments to protocols #87 and Defining /http #63 (comment) (personally like this best out of existing proposals)
      • /ip4/127.0.0.1/tcp/8080/tls(sni=example.com)/
      • /ip4/127.0.0.1/tcp/8080/http(hostname=example.com,base=/api/v1,user=john,password=doh,cookie=bla:bla)/
      • /ip4/127.0.0.1/tcp/8080/tls(sni=example.com)/http(hostname=something-else.com) (enables censorship circumvention via domain-fronting)
    • add nestable key-value "parameter protocol" that applies to previous non-key-value protocol. HTTP path aka base is tricky as it includes / which needs to be escaped somehow:
      • /ip4/127.0.0.1/tcp/8080/http/kv/key1/value1/kv/key2/value2/
        /ip4/127.0.0.1/tcp/8080/http/kv/host/example.com/kv/base/api\/v2" (escape with \ )
        /ip4/127.0.0.1/tcp/8080/http/kv/host/example.com/kv/base/"/api/v2" (use quotes)
        /ip4/127.0.0.1/tcp/8080/http/kv/host/example.com/kv/base/"/api/v2"/kv/user/joe/kv/password/joepass 😬
    • use Matrix notation decribed in https://www.w3.org/DesignIssues/MatrixURIs.html
      • /ip4/127.0.0.1/tcp/8080/tls;sni=example.com/
      • /ip4/127.0.0.1/tcp/8080/http;hostname=example.com;base=/api/v1;user=john;password=doh;cookie=bla/
      • /ip4/127.0.0.1/tcp/8080/tls;sni=example.com;/http;hostname=something-else.com
    • something else?

@tomaka
Copy link
Member

tomaka commented Aug 15, 2019

open question: how to represent protocol-specific parameters?

I think that this question has been open for almost 2 years now, and I've been in at least 2 meetings that have extensively discussed this exact thing. Maybe it is time for a decision, which is arbitrary anyway, and not just raise the question again.

@ntninja
Copy link
Contributor

ntninja commented Aug 21, 2019

@lidel: Your “two” proposals are both technically identical to my proposal in #87, except for one of them using a different (an IMHO less readable) string-representation for the MultiAddr. If there is now consent on the requirement that we need an “arbitrary parameter”-extension for MultiAddr either way, that means we now just need to decide on the syntax, right?
My main objection to the second textual MultiAddr representation you mentioned is that it's hard to read a string such as /ip4/127.0.0.1/tcp/8080/http/kv/host/example.com/kv/base/"/api/v2"/kv/user/joe/kv/password/joepass: It's not very obvious that the /kv thing has any special connection to the preceding /http item and finding the actual end of the /http item becomes error-prone for the casual human observer. Other than that they are totally equivalent, ofc.

@lidel
Copy link
Member

lidel commented Aug 22, 2019

@Alexander255 yes, those were just quick examples, I prefer the first one more as well :)

Thank you for creating #87, it states the problem space nicely.
Let's move this discussion there (/http is only one of many protocols that could use parameter support)

ps. I believe if someone wants to work on this, improving multiaddr SPEC would be the first step 🙏

@infinity0
Copy link

Just wanted to point out that /unix supports both stream and datagram socket types but not protocols e.g. tcp, you can't create a unix-domain socket with any protocol numbers other than 0. So to represent http you probably want something like /unix/$path/stream/http/etc or maybe /unix/stream/$path/http/etc.

@MinimalArchitect
Copy link

This issue has been pretty stale over the recent years, even if it is one of the most major issues of the multiaddress-scheme...

I suggest to not over-complcate the addressing-scheme with parenthesis.

Proposed format

Unix paths interpret '//' and '/' the same which is technically ok and could be human-interpreted as '/./.' as the '/' symbol does not name a file (or directory, etc.) this leeway is not given in a multidomain addressing scheme and as the implementations I have looked at interpret it as an encoding error as only single '/' are allowed. This can be updated in the following that unix (http, etc.) search for a '//' symbol (word, whatever...) which marks the end of the path.

/ip4/127.0.0.1/udp/1234/tls/auth/user=xxx,password=yyy/unix/werewr/aerw/document.socket//p2p/blablabla...

The more correct way

A path specification on a single domain (file-system, etc.) does not have to deal with switching protocols, to update this one could encode every part of the path with its protocol e.g.

/ip4/127.0.0.1/udp/1234/tls/auth/user=xxx,password=yyy/unix/path/werewr/path/aerw/path/document.socket/p2p/blablabla...

This is tedious and I don't recommend it.

@MarcoPolo
Copy link
Contributor

This should be resolved with libp2p/specs#550

@achingbrain achingbrain mentioned this issue Mar 27, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants