Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent gRPC-Web behavior #15

Open
DazWilkin opened this issue Nov 14, 2024 · 10 comments
Open

Inconsistent gRPC-Web behavior #15

DazWilkin opened this issue Nov 14, 2024 · 10 comments
Labels
bug Something isn't working

Comments

@DazWilkin
Copy link

DazWilkin commented Nov 14, 2024

Using tonic-web with any (but e.g. ElizaService) client.rs (as-is URL changes aside) against FauxRPC:

Error: Status { code: Unknown, message: " ", source: None }

Even though fauxrpc run --addr=localhost:6660 --schema=eliza.binpb:

2024/11/14 11:39:28 INFO MethodCalled service=connectrpc.eliza.v1.ElizaService method=Say
2024/11/14 11:39:28 | 200 |  3.228239993s |    100.65.15.44 | POST     "/connectrpc.eliza.v1.ElizaService/Say"

Reconfiguring the rust client to use https://demo.connectrpc.com:

Response { metadata: MetadataMap { headers: {"content-type": "application/grpc-web", "grpc-accept-encoding": "gzip", "vary": "Origin", "date": "Thu, 14 Nov 2024 19:46:39 GMT", "server": "Google Frontend", "traceparent": "00-6d243811558d7e0f0f15a46b4cefb357-60228e18328eeff4-01", "x-cloud-trace-context": "6d243811558d7e0f0f15a46b4cefb357/6927255411427831796;o=1", "via": "1.1 google", "alt-svc": "h3=\":443\"; ma=2592000,h3-29=\":443\"; ma=2592000", "transfer-encoding": "chunked"} }, message: SayResponse { sentence: "How do you feel when you say that?" }, extensions: Extensions }

In the latter case, I'm having to rewire the client to use TLS too but, using tailscale serve --https=443 6660, I can put a TLS proxy in front of FauxRPC too and continue to receive:

Error: Status { code: Unknown, message: " ", source: None }

All the while FauxRPC reports success (200) on the POST /.../ElizaService/Say.

This suggests there's something inconsistent in FauxRPC's behavior that's not present in ConnectRPC.

I've been flummoxed by this.

I used Kreya because it supports gRPC-Web and it works correctly with both the ConnectRPC and FauxRPC endpoints.

Yesterday, I used socat to try and capture the traffic using the tonic-web's helloworld.proto against FauxRPC.

With minor differences, there's no obvious (to me breaking) difference:

client.rs generates the Error:

> 2024/11/13 12:12:49.000320511  length=171 from=0 to=170
POST /helloworld.Greeter/SayHello HTTP/1.1\r
te: trailers\r
content-type: application/grpc-web\r
host: localhost:12345\r
transfer-encoding: chunked\r
\r
E\r
....	
\aFreddie\r
0\r
\r
< 2024/11/13 12:12:51.000759623  length=145 from=0 to=144
HTTP/1.1 200 OK\r
Content-Type: application/grpc-web+proto\r
Date: Wed, 13 Nov 2024 20:12:51 GMT\r
Transfer-Encoding: chunked\r
\r
e\r
....	
\aFreddie\r
< 2024/11/13 12:12:51.000760926  length=48 from=145 to=192
25\r
.... Grpc-Message: \r
Grpc-Status: 0\r
\r
0\r
\r

Kreya responds with Ok:

> 2024/11/13 12:14:40.000795916  length=216 from=0 to=215
POST /helloworld.Greeter/SayHello HTTP/1.1\r
Host: localhost:12345\r
TE: trailers\r
grpc-accept-encoding: identity,gzip,deflate\r
Transfer-Encoding: chunked\r
Content-Type: application/grpc-web\r
\r
E\r
....	
\aFreddie\r
0\r
\r
< 2024/11/13 12:14:42.000968834  length=145 from=0 to=144
HTTP/1.1 200 OK\r
Content-Type: application/grpc-web+proto\r
Date: Wed, 13 Nov 2024 20:14:42 GMT\r
Transfer-Encoding: chunked\r
\r
e\r
....	
\aFreddie\r
< 2024/11/13 12:14:42.000970169  length=48 from=145 to=192
25\r
.... Grpc-Message: \r
Grpc-Status: 0\r
\r
0\r
\r
@DazWilkin
Copy link
Author

As an aside, I believe, while gRPC-Web is supported by FauxRPC, CORS is not. Is that correct?

The consequence is that gRPC-Web clients except browsers are supported.

@sudorandom
Copy link
Owner

sudorandom commented Nov 14, 2024

I don't have time to dig deeply into this right this moment, but I can throw a couple suggestions:

  • Maybe you can use the JSON encoding just to make it easier to verify that nothing weird is happening with HTTP responses.
  • If possible, capturing the raw bytes might give you hints. gRPC (and gRPC-Web) have framing bits before each protobuf message which tells you the message size and if it's compressed. These framing bytes could be and be impossible to see with your package current method of packet capture. The tools I've used for this are pcap and/or wireshark. If you have a "working" and "broken" pcap file I can help you figure out the differences, if any.
  • Maybe try disabling gzip with tonic-web? The grpc-accept-encoding header being different may be a big hint at compression being different

I can easily add CORS support in to make this easier. You could theoretically use a load balancer to serve up a UI and FauxRPC with the same host. In which case, CORS doesn't really matter or you could have the load balancer serve up the CORS headers... but I can see how that can be extra work, so this is a good feature to add directly to FauxRPC. Can you make a different github issue for that to track it separately?

@DazWilkin
Copy link
Author

Thanks for the feedback and suggestions; I'll give them a whirl.

I'm running the Web server and FauxRPC on localhost; CORS rears its head because of the different ports, I'd thought?

I'll open an issue for CORS support.

@sudorandom
Copy link
Owner

From looking at wireshark output, there's a hint of what is happening.

Here is what it looks like when using the rust server:
Screenshot 2024-11-16 at 08 15 48

Here is FauxRPC:
Screenshot 2024-11-16 at 08 17 04

Notice that there's one more packet with FauxRPC. That's because FauxRPC is sending the trailers after it sends the data. For whatever reason the implementation is splitting the response message and the trailers into multiple packets. It appears like tonic doesn't handle this very well. I think both scenarios are valid, so I'm leaning on this being an issue with the client side of tonic_web.

@sudorandom
Copy link
Owner

I'm out of my depth with rust, but I believe we might learn more about this issue from debugging this poll_frame function: https://github.com/hyperium/tonic/blob/master/tonic-web/src/call.rs#L258. This reads to me like it is supposed to be reading frames until it reads the trailers... but it appears like something is off either with the logic or the data.

@sudorandom
Copy link
Owner

I've found the issue. tonic-web appears to be parsing the trailers wrong in some cases... it's seeing an extra space with FauxRPC. This is basically what the trailers look like from FauxRPC:

grpc-message: 
grpc-status: 0

The tonic_web server does this:

grpc-status:0

You don't see it in your output because it's not actually printing the raw trailers.

Note the spaces after the separating colon :. These spaces are dictated by the grpc-web protocol spec so the tonic-web gRPC-Web client should handle the space after the colon. I think many client implementations deal with the space being absent or present, which why this flaw may have gone unnoticed. The tonic-web client is also likely mostly tested exclusively with tonic-web server, and those both don't use a space there. Again, I'm not a rust dev, so I'm not sure how effective I can be at making a patch to tonic-web 😅 I may try to attempt the fix if I find the time 😅

Thanks for this issue though, this has been a fun learning experience! 🦀

@sudorandom
Copy link
Owner

I filed a PR for tonic. From my testing it fixes your issue 😀

@sudorandom sudorandom added the bug Something isn't working label Nov 16, 2024
@DazWilkin
Copy link
Author

Wow!

Thank you very much. I saw the PR. I'll try it out.

It makes sense that they're only testing against Rust (tonic) servers.

Does this explain why the Rust client worked with Connect's public Eliza service?

@sudorandom
Copy link
Owner

Because this is only an issue with HTTP/1.1 and I think the Rust client would connect using HTTP/2 with demo.connectrpc.com. HTTP/2 has binary framing which is much more strict so this isn't an issue there. The usual behavior for HTTP clients is to never use HTTP/2 on non-TLS connections.

This, however, can be forced by using a feature called "h2c" (sometimes referred to as http2 prior knowledge). It does look like tonic supports it... but I'm not going to lie, it looks oddly complicated to use... To test this fully you'd want to take the h2c example and adapt it to use grpc-web instead of just gRPC (because this trailer parsing logic is exclusive to gRPC-Web).

@sudorandom
Copy link
Owner

Oh wait, I'm actually wrong about HTTP/2 here. Even with HTTP/2, gRPC-Web trailers are passed in the body... so your question still stands at why the behavior is different.

I will note that FauxRPC actually uses Vanguard to enable gRPC/gRPC-Web/Connect/REST protocols... So the implementation is actually different. gRPC-Web trailers may be different between ConnectRPC and Vanguard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants