-
-
Notifications
You must be signed in to change notification settings - Fork 956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got resolves streamed request before server responds #1481
Comments
I also tested with node@14.13.0 and the issue persists. |
Got is not gonna wait for the response from the server because there's no one to read it, if you're interested in the response you should add something to "read" the response. Example const stream = require('stream');
const fs = require('fs');
const got = require('got');
const util = require('util');
(async () => {
await util.promisify(stream.pipeline)(
fs.createReadStream('./test.txt'),
got.stream('http://localhost:8888', { method: 'PUT' }).on('finish', () => {
console.log('finish');
}),
new stream.PassThrough()
);
console.log('Pipeline END');
})(); Please refer to #1269 for more informations. |
@Giotino This is incorrect because if the server replies with an HTTP error or times out got will not fail the request. This is a major problem in the reliability of got. The response can be discarded of course but it should be waited for to detect if the request has been successful and to make sure that the server processed it. For example if I remember correctly in PHP if the client is gone (closed connection) then the execution of the script is halted. This means that the server will never receive the request. |
I agree with you, but this is going to change the behavior of Got. How long should we wait before Judging the meaning of the Currently I think the best way to handle this is leaving everything as it is and have the user add the PassThrough stream if they care about the response. Fell free to propose some other way to handle things, but please don't break something else while doing it. |
@Giotino Thanks for replying
I think most people assume that Got will throw an error if a 4xx or 5xx status code is sent by the server. Doing the request by passing the body or by using a stream should not change that behavior. The stream is just a solution to avoid keeping a lot of data in memory. It should provide the same level of reliability.
The request is finished if the server says it has received it. If we don't know what the server said we cannot assume it is finished. We may have the wrong URL or method or server may have timed out. I see two options:
I honestly don't know which is better but maybe we can ask more people that are familiar with the HTTP protocol specification.
I disagree but even in that case this is not documented. There should be a clear warning in the example saying that got won't detect any HTTP response error, nor any connection drop during the transmission of the response. I really like this library. It has a very good API. |
@Giotino I forgot to mention that in case of a persistent connection the whole body should be read to allow the server to respond to the next request in the queue. |
I think that your reply is a good starting point to discuss about this behavior. I agree with @aalexgabi when he says that the pipeline should wait for the server response. If someone wants an event for "finished to write the request" we can add an event replacing the current |
I cannot reproduce: const http = require('http');
const got = require('got');
const toReadableStream = require('to-readable-stream');
const {promisify} = require('util');
const pipeline = promisify(require('stream').pipeline);
(async () => {
await pipeline(
toReadableStream('hello world'),
got.stream.put(`http://localhost:8888`).on('request', request => {
request.once('socket', socket => {
socket.once('end', () => {
console.log('socked ended');
});
});
})
);
})();
I don't think it does. Can you please post some reproducible code please? |
We shouldn't at all. We just pass the data straight to Node.js.
Yep, and that's what we're doing right now. |
Launch server: netcat -l 8888 Run const fs = require('fs');
const got = require('got');
const { promisify } = require('util');
const pipeline = promisify(require('stream').pipeline);
(async () => {
try {
await pipeline(
fs.createReadStream('/proc/cpuinfo'),
got.stream.put('http://localhost:8888'),
);
console.log('resolved');
} catch (err) {
console.log('rejected', err);
}
})(); Prints: % node t.js
resolved
^C Not only it resolves but it also hangs the process which means there is some kind of leak |
Why wouldn't it? You just piped
No, it waits for the timeout. It defaults to 30s, then it errors. I'm going to close this issue for now. If you've got any other questions, feel free to continue the discussion here. |
Aside: I think there is a Node.js bug - it leaves a hanging |
@szmarczak The server didn't say anything so how do you know if it was sent and not buffered by a TCP proxy or even better if the server would have returned a redirect 307. The only way to know that is by waiting for the HTTP status code. The pipeline should have failed with a timeout error. This is how streams work: you wait for the last flush of buffers on the disk. When the hardware confirms that the data is saved then you consider the write successful. In the same way you need to have confirmation from the server that the request has been processed in order to consider it successful. |
From the point of view of the TCP protocol yes the data has been sent successfully. A HTTP client should talk HTTP not TCP. If a server crashes once every 1000 requests got will loose data at the same rate because the client does not retry the request when it should. |
No, because all it did is just send a payload to the server, not receive it. We don't care if the server has processed it or not. That's not how The stream, indeed, throws a timeout error but you need to catch it by yourself - see the issue attached above. |
It's up to the server to acknowledge the payload, not the client. The server may as well respond and ignore the payload completely (e.g. it can send 200 OK and close the connection before the entire payload was transmitted). |
Please include some reproducible code. |
We don't know if the payload has been sent to the server because we did not receive any acknowledgment from the server. Simply being able to write the data to a TCP socket does not mean that it reached it's destination server. It may have been buffered by a network proxy. Also it does not mean that the server was able to process it. The server may have crashed or had a temporary database connectivity issue.
As a developer I do care because it's my responsibility not to loose any data. As long as I don't have 2xx response code I don't know if the data has been received. I'm not going to assume anything.
The usage of pipeline is an implementation detail of how streams work in Node.js. From a functional perspective the upload stream is not done because we don't have a response code. Finish event should be delayed until response headers are read and processed. If 2xx code is received finish can be emitted. If 3xx is received, the redirect must follow. If 5xx or 4xx is received the stream should emit an
Yes but it's the client's responsibility to read status codes and report them to the user/caller.
Sure but that's the server's problem because it's the server's responsibility to return the correct status code.
I'm not going to spend more time on this. A server reads the data from the TCP socket and then tries to connect to db but the connection fails. The client would have retried if it received a 500 error code. It's quite easy to imagine if someone tries. @szmarczak In the meantime I'm using |
Got can work the same way of
const {createReadStream} = require('fs');
const fetch = require('node-fetch');
const stream = createReadStream('input.txt');
(async () => {
const response = await fetch('https://httpbin.org/post', {method: 'POST', body: stream});
const json = await response.json();
console.log(json)
})();
const {createReadStream} = require('fs');
const got = require('got');
const stream = createReadStream('input.txt');
(async () => {
const response = got('https://httpbin.org/post', {method: 'POST', body: stream});
const json = await response.json();
console.log(json)
})(); Or at least it should, but it doesn't work when the stream size is unknown (it should send a chunked body, but instead it set the @szmarczak I think we have a new bug |
If it doesn't then you'll get a
Then you're on your own, Got won't delay the
You're not right. https://nodejs.org/api/stream.html#stream_event_finish
Flushed doesn't mean acknowledged. |
We offer everything that
@Giotino can you open a new issue for this please? Sorry, reopened this one by accident. |
@Giotino For files we do got/source/core/utils/get-body-size.ts Lines 30 to 33 in c31366b
So if your file is 0 bytes then it is correct and it's no bug. |
@Giotino Thank you. I didn't know that you can pass a stream for the body property. In the "Stream API" section of the documentation it's not mentioned so I was unaware this is possible: https://www.npmjs.com/package/got#streams I did a test and it works as expected. Server that times out: netcat -l 8888 Code: const { createReadStream } = require('fs');
const got = require('got');
(async () => {
try {
await got('http://localhost:8888', {
method: 'POST',
body: createReadStream('/proc/cpuinfo'),
timeout: 5000,
});
console.log('resolves');
} catch (err) {
console.log('rejects', err.message);
}
})(); Prints: % node t.js
rejects Timeout awaiting 'request' for 5000ms
% Now the question is what makes it different between using a pipeline and passing the stream body. From a functional perspective it should be the same. At the very least I think it's important to document that using pipeline is not reliable when it comes to detecting timeouts and HTTP errors and a stream should be passed to the body property. Currently using the code from the example is problematic because of this. |
@aalexgabi a small note: currently due to a bug you cannot send files that have size Regarding the missing stream documentation: It's documented on the Regarding the |
API stands for Application Programming Interface and the interface Got returns is either a Promise or a Stream. It's simple as that. The |
We already link to the Node.js docs about |
Yeah, but it doesn't talk about the behavior described by this issue. At least we can modify the pipeline example as something like the following const stream = require('stream');
const {promisify} = require('util');
const fs = require('fs');
const got = require('got');
const pipeline = promisify(stream.pipeline);
(async () => {
await pipeline(
got.stream('https://sindresorhus.com'),
fs.createWriteStream('index.html')
);
// For POST, PUT, and PATCH methods `got.stream` returns a `stream.Writable`
await pipeline(
fs.createReadStream('index.html'),
got.stream.post('https://sindresorhus.com'),
new stream.PassThrough() // OPTIONAL: We use a PassThrough in order to catch error on response (e.g. HTTP 500)
);
})(); |
Again, only one click is required to learn more :P
Yep, that sounds good. Feel free to send a PR. |
Describe the bug
Actual behavior
got resolves stream upload request without waiting for server to reply
I have reported a somewhat similar issue in the past here: #1026
Expected behavior
got should wait for the server to respond
Code to reproduce
Create a server that never responds
Upload a small file
The finish event will be generated
I inspected the network traffic and it seems that got closes the connection (almost immediately):
![image](https://user-images.githubusercontent.com/600821/94794907-dec3e280-03dc-11eb-8209-382d0d3fd98d.png)
Checklist
The text was updated successfully, but these errors were encountered: