Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got resolves streamed request before server responds #1481

Closed
1 task done
aalexgabi opened this issue Oct 1, 2020 · 28 comments
Closed
1 task done

Got resolves streamed request before server responds #1481

aalexgabi opened this issue Oct 1, 2020 · 28 comments

Comments

@aalexgabi
Copy link

Describe the bug

  • Node.js version: v13.7.0
  • OS & version: Manjaro 20
  • got version: 11.7.0

Actual behavior

got resolves stream upload request without waiting for server to reply
I have reported a somewhat similar issue in the past here: #1026

Expected behavior

got should wait for the server to respond

Code to reproduce

Create a server that never responds

netcat -l 8888

Upload a small file

await pipeline(
          fs.createReadStream('./file'),
          got.stream('http://localhost:8888', { method: 'PUT' }).on('finish', () => {
            console.log('finish');
          }),
),

The finish event will be generated

I inspected the network traffic and it seems that got closes the connection (almost immediately):
image

Checklist

  • I have read the documentation.
  • [0.5] I have tried my code with the latest version of Node.js and Got.
@aalexgabi
Copy link
Author

I also tested with node@14.13.0 and the issue persists.

@aalexgabi aalexgabi changed the title Got resolves stramed request before server responds Got resolves streamed request before server responds Oct 1, 2020
@Giotino
Copy link
Collaborator

Giotino commented Oct 1, 2020

Got is not gonna wait for the response from the server because there's no one to read it, if you're interested in the response you should add something to "read" the response.

Example

const stream = require('stream');
const fs = require('fs');
const got = require('got');
const util = require('util');

(async () => {
  await util.promisify(stream.pipeline)(
    fs.createReadStream('./test.txt'),
    got.stream('http://localhost:8888', { method: 'PUT' }).on('finish', () => {
      console.log('finish');
    }),
    new stream.PassThrough()
  );

  console.log('Pipeline END');
})();

Please refer to #1269 for more informations.

@Giotino Giotino closed this as completed Oct 1, 2020
@aalexgabi
Copy link
Author

@Giotino This is incorrect because if the server replies with an HTTP error or times out got will not fail the request.

This is a major problem in the reliability of got. The response can be discarded of course but it should be waited for to detect if the request has been successful and to make sure that the server processed it.

For example if I remember correctly in PHP if the client is gone (closed connection) then the execution of the script is halted. This means that the server will never receive the request.

@Giotino
Copy link
Collaborator

Giotino commented Oct 1, 2020

I agree with you, but this is going to change the behavior of Got.

How long should we wait before finishing the request?
A correct answer would be untill the start of the response body, or at least the end of the first response line (I think).

Judging the meaning of the finish event it should be emitted when the request is finished, but here we are going to emit it right after the start of the response.

Currently I think the best way to handle this is leaving everything as it is and have the user add the PassThrough stream if they care about the response.

Fell free to propose some other way to handle things, but please don't break something else while doing it.

@aalexgabi
Copy link
Author

@Giotino Thanks for replying

I agree with you, but this is going to change the behavior of Got.

I think most people assume that Got will throw an error if a 4xx or 5xx status code is sent by the server. Doing the request by passing the body or by using a stream should not change that behavior. The stream is just a solution to avoid keeping a lot of data in memory. It should provide the same level of reliability.

Judging the meaning of the finish event it should be emitted when the request is finished, but here we are going to emit it right after the start of the response.

The request is finished if the server says it has received it. If we don't know what the server said we cannot assume it is finished. We may have the wrong URL or method or server may have timed out.

I see two options:

  • when \r\n\r\n is received (end of headers) the connection is closed and the received headers are processed as usual
  • when the last byte of the response is received (the server may log an error if the connection is dropped during the transmission of the response)

I honestly don't know which is better but maybe we can ask more people that are familiar with the HTTP protocol specification.

Currently I think the best way to handle this is leaving everything as it is and have the user add the PassThrough stream if they care about the response.

I disagree but even in that case this is not documented. There should be a clear warning in the example saying that got won't detect any HTTP response error, nor any connection drop during the transmission of the response.

I really like this library. It has a very good API.
This kind of behavior from a HTTP library is frustrating and time consuming.
I want to congratulate everybody working on the project for providing the simplest API ever.

@aalexgabi
Copy link
Author

@Giotino I forgot to mention that in case of a persistent connection the whole body should be read to allow the server to respond to the next request in the queue.

@Giotino
Copy link
Collaborator

Giotino commented Oct 1, 2020

I think that your reply is a good starting point to discuss about this behavior.
I would like to have some comments from @sindresorhus @szmarczak.
Since we're working on Got 12 we can make it more user-friendly by changing the meaning of the finish event as pointed out by @aalexgabi .

I agree with @aalexgabi when he says that the pipeline should wait for the server response. If someone wants an event for "finished to write the request" we can add an event replacing the current finish.

@Giotino Giotino reopened this Oct 1, 2020
@szmarczak
Copy link
Collaborator

I cannot reproduce:

const http = require('http');
const got = require('got');
const toReadableStream = require('to-readable-stream');
const {promisify} = require('util');

const pipeline = promisify(require('stream').pipeline);

(async () => {
    await pipeline(
          toReadableStream('hello world'),
          got.stream.put(`http://localhost:8888`).on('request', request => {
            request.once('socket', socket => {
                socket.once('end', () => {
                    console.log('socked ended');
                });
            });
          })
    );
})();

socket ended never gets called. Got never closes the socket by itself. I don't think Node.js closes the socket on .end().

I think most people assume that Got will throw an error if a 4xx or 5xx status code is sent by the server. Doing the request by passing the body or by using a stream should not change that behavior.

I don't think it does. Can you please post some reproducible code please?

@szmarczak
Copy link
Collaborator

@Giotino

How long should we wait before finishing the request?

We shouldn't at all. We just pass the data straight to Node.js.

Currently I think the best way to handle this is leaving everything as it is and have the user add the PassThrough stream if they care about the response.

Yep, and that's what we're doing right now.

@aalexgabi
Copy link
Author

Launch server:

netcat -l 8888

Run

const fs = require('fs');
const got = require('got');
const { promisify } = require('util');

const pipeline = promisify(require('stream').pipeline);

(async () => {
  try {
    await pipeline(
      fs.createReadStream('/proc/cpuinfo'),
      got.stream.put('http://localhost:8888'),
    );

    console.log('resolved');
  } catch (err) {
    console.log('rejected', err);
  }
})();

Prints:

% node t.js
resolved
^C

Not only it resolves but it also hangs the process which means there is some kind of leak

@szmarczak
Copy link
Collaborator

Not only it resolves

Why wouldn't it? You just piped /proc/cpuinfo to got.stream.put(). That means that payload was sent successfully.

hangs the process

No, it waits for the timeout. It defaults to 30s, then it errors.

I'm going to close this issue for now. If you've got any other questions, feel free to continue the discussion here.

@szmarczak
Copy link
Collaborator

szmarczak commented Oct 1, 2020

Aside: I think there is a Node.js bug - it leaves a hanging error handler on the last item, in this case got.stream.put(). The process just exists, and it should throw uncaught exception instead. See nodejs/node#35452

@aalexgabi
Copy link
Author

That means that payload was sent successfully.

@szmarczak The server didn't say anything so how do you know if it was sent and not buffered by a TCP proxy or even better if the server would have returned a redirect 307. The only way to know that is by waiting for the HTTP status code. The pipeline should have failed with a timeout error.

This is how streams work: you wait for the last flush of buffers on the disk. When the hardware confirms that the data is saved then you consider the write successful. In the same way you need to have confirmation from the server that the request has been processed in order to consider it successful.

@aalexgabi
Copy link
Author

From the point of view of the TCP protocol yes the data has been sent successfully.
But from the point of view of the HTTP protocol we don't know yet if the data has been sent successfully.

A HTTP client should talk HTTP not TCP.

If a server crashes once every 1000 requests got will loose data at the same rate because the client does not retry the request when it should.

@szmarczak
Copy link
Collaborator

The pipeline should have failed with a timeout error.

No, because all it did is just send a payload to the server, not receive it. We don't care if the server has processed it or not. That's not how pipeline works. Please open a Node.js issue if you want to discuss the pipeline behavior.

The stream, indeed, throws a timeout error but you need to catch it by yourself - see the issue attached above.

@szmarczak
Copy link
Collaborator

A HTTP client should talk HTTP not TCP.

It's up to the server to acknowledge the payload, not the client. The server may as well respond and ignore the payload completely (e.g. it can send 200 OK and close the connection before the entire payload was transmitted).

@szmarczak
Copy link
Collaborator

If a server crashes once every 1000 requests got will loose data at the same rate because the client does not retry the request when it should.

Please include some reproducible code.

@aalexgabi
Copy link
Author

No, because all it did is just send a payload to the server, not receive it.

We don't know if the payload has been sent to the server because we did not receive any acknowledgment from the server. Simply being able to write the data to a TCP socket does not mean that it reached it's destination server. It may have been buffered by a network proxy. Also it does not mean that the server was able to process it. The server may have crashed or had a temporary database connectivity issue.

We don't care if the server has processed it or not

As a developer I do care because it's my responsibility not to loose any data. As long as I don't have 2xx response code I don't know if the data has been received. I'm not going to assume anything.

that's not how pipeline works

The usage of pipeline is an implementation detail of how streams work in Node.js. From a functional perspective the upload stream is not done because we don't have a response code. Finish event should be delayed until response headers are read and processed. If 2xx code is received finish can be emitted. If 3xx is received, the redirect must follow. If 5xx or 4xx is received the stream should emit an error event.

It's up to the server to acknowledge the payload, not the client.

Yes but it's the client's responsibility to read status codes and report them to the user/caller.

The server may as well respond and ignore the payload completely (e.g. it can send 200 OK and close the connection before the entire payload was transmitted).

Sure but that's the server's problem because it's the server's responsibility to return the correct status code.

Please include some reproducible code.

I'm not going to spend more time on this. A server reads the data from the TCP socket and then tries to connect to db but the connection fails. The client would have retried if it received a 500 error code. It's quite easy to imagine if someone tries.

@szmarczak In the meantime I'm using node-fetch every time I need to stream data over HTTP.

@Giotino
Copy link
Collaborator

Giotino commented Oct 4, 2020

@szmarczak In the meantime I'm using node-fetch every time I need to stream data over HTTP.

@aalexgabi

Got can work the same way of node-fetch for sending data.

node-fetch

const {createReadStream} = require('fs');
const fetch = require('node-fetch');

const stream = createReadStream('input.txt');

(async () => {
	const response = await fetch('https://httpbin.org/post', {method: 'POST', body: stream});
	const json = await response.json();
	
	console.log(json)
})();

got

const {createReadStream} = require('fs');
const got = require('got');

const stream = createReadStream('input.txt');

(async () => {
	const response = got('https://httpbin.org/post', {method: 'POST', body: stream});
	const json = await response.json();
	
	console.log(json)
})();

Or at least it should, but it doesn't work when the stream size is unknown (it should send a chunked body, but instead it set the content-length to 0).

@szmarczak I think we have a new bug

@szmarczak
Copy link
Collaborator

szmarczak commented Oct 4, 2020

Simply being able to write the data to a TCP socket does not mean that it reached it's destination server.

If it doesn't then you'll get a ECONNRESET error. You are responsible for catching it, got.stream() returns a pure Node.js stream, and the way you use pipeline it just flushes the data. You expect different behavior but that's how Node.js behaves and we aren't here to discuss the default Node.js behavior.

As a developer I do care because it's my responsibility not to loose any data.

Then you're on your own, Got won't delay the finished event because you wish so. That's not how Node.js works.

Finish event should be delayed until response headers are read and processed.

You're not right. https://nodejs.org/api/stream.html#stream_event_finish

Event: 'finish'
Added in: v0.9.4
The 'finish' event is emitted after the stream.end() method has been called, and all data has been flushed to the underlying system.

Flushed doesn't mean acknowledged.

@szmarczak szmarczak reopened this Oct 4, 2020
@szmarczak
Copy link
Collaborator

In the meantime I'm using node-fetch every time I need to stream data over HTTP.

We offer everything that node-fetch has and even much more. See @Giotino's answer.

Or at least it should, but it doesn't work when the stream size is unknown (it should send a chunked body, but instead it set the content-length to 0).

@Giotino can you open a new issue for this please? Sorry, reopened this one by accident.

@szmarczak
Copy link
Collaborator

@Giotino For files we do

if (body instanceof ReadStream) {
const {size} = await statAsync(body.path);
return size;
}

So if your file is 0 bytes then it is correct and it's no bug.

@aalexgabi
Copy link
Author

@Giotino Thank you. I didn't know that you can pass a stream for the body property. In the "Stream API" section of the documentation it's not mentioned so I was unaware this is possible: https://www.npmjs.com/package/got#streams

I did a test and it works as expected.

Server that times out:

netcat -l 8888

Code:

const { createReadStream } = require('fs');
const got = require('got');

(async () => {
  try {
    await got('http://localhost:8888', {
      method: 'POST',
      body: createReadStream('/proc/cpuinfo'),
      timeout: 5000,
    });

    console.log('resolves');
  } catch (err) {
    console.log('rejects', err.message);
  }
})();

Prints:

% node t.js
rejects Timeout awaiting 'request' for 5000ms
% 

Now the question is what makes it different between using a pipeline and passing the stream body. From a functional perspective it should be the same.

At the very least I think it's important to document that using pipeline is not reliable when it comes to detecting timeouts and HTTP errors and a stream should be passed to the body property. Currently using the code from the example is problematic because of this.

@Giotino
Copy link
Collaborator

Giotino commented Oct 5, 2020

@aalexgabi a small note: currently due to a bug you cannot send files that have size 0 on stat (e.g. files that are not "real file" on linux): your example works, but it doesn't send any body (because you're sending /proc/cpuinfo).

Regarding the missing stream documentation: It's documented on the body option, but I agree with you that reporting it in the stream section could be helpful.

Regarding the pipeline behavior: we should document that a user needs to add a stream.PassThrough if he cares about the response (status code and errors). I know we are not here to teach users how to use Node.js, but I think this is a tricky case and we could save a lot of time to users by writing two lines about this.

@szmarczak
Copy link
Collaborator

API stands for Application Programming Interface and the interface Got returns is either a Promise or a Stream. It's simple as that. The body is just an option, not the API itself. It can be anything. I'm 100% if you just opened the search box and typed stream kept pressing enter you'd find what you were looking for.

@szmarczak
Copy link
Collaborator

Regarding the pipeline behavior: we should document that a user needs to add a stream.PassThrough if he cares about the response (status code and errors). I know we are not here to teach users how to use Node.js, but I think this is a tricky case and we could save a lot of time to users by writing two lines about this.

We already link to the Node.js docs about pipeline. I don't think we need to copy them. It just requires one click to learn more about pipeline.

@Giotino
Copy link
Collaborator

Giotino commented Oct 5, 2020

We already link to the Node.js docs about pipeline. I don't think we need to copy them. It just requires one click to learn more about pipeline.

Yeah, but it doesn't talk about the behavior described by this issue.

At least we can modify the pipeline example as something like the following

const stream = require('stream');
const {promisify} = require('util');
const fs = require('fs');
const got = require('got');

const pipeline = promisify(stream.pipeline);

(async () => {
    await pipeline(
        got.stream('https://sindresorhus.com'),
        fs.createWriteStream('index.html')
    );

    // For POST, PUT, and PATCH methods `got.stream` returns a `stream.Writable`
    await pipeline(
        fs.createReadStream('index.html'),
        got.stream.post('https://sindresorhus.com'),
        new stream.PassThrough() // OPTIONAL: We use a PassThrough in order to catch error on response (e.g. HTTP 500)
    );
})();

@szmarczak
Copy link
Collaborator

Yeah, but it doesn't talk about the behavior described by this issue.

Again, only one click is required to learn more :P

At least we can modify the pipeline example as something like the following

Yep, that sounds good. Feel free to send a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants