-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core-https] Implement HttpsClient
interface for node and browser
#9082
Conversation
Everything builds clean for the moment.
Respect the skipDecompressResponse flag and support unzip on streams.
} | ||
} | ||
|
||
if (!request.skipDecompressResponse) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to clear Accept-Encoding
header for the else
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we could... but what would the scenario be? The user sets it as a manual header in the operation or another policy in the pipeline sets it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff! I just have a couple minor comments
* Get the JSON object representation of this HTTP header collection. | ||
*/ | ||
public toJson(): RawHttpHeaders { | ||
return this.raw(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and raw() do the same thing right? Is this here for backwards compat or is the plan for toJson
to do something else in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this is silly. I'm dropping raw
and keeping only toJSON
.
I also realized that it needs to be toJSON
rather than toJson
in order to make JSON.stringify do the right thing. I fixed that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it different to have toJson
vs toJSON
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JSON.stringify looks for a toJSON
method when serializing to decide how to encode objects: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify
}, request.timeout); | ||
} | ||
|
||
if (request.formData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it would make sense to extract these request preparation steps to their own functions? I think it may help readability and make it easier to focus on the core of the function which is sending the request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Pulled this out into its own function
try { | ||
const result = await new Promise<PipelineResponse>((resolve, reject) => { | ||
const agent = getOrCreateAgent(request); | ||
const url = new URL(request.url); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same suggestion as above, some of these blocks could be extracted into their own functions to help readability. For example
const options: https.RequestOptions = getRequestOptions(request);
...
const headers = getResponseHeaders(res)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pulled these out too
@joheredi @jeremymeng Are there any changes that are blocking your approval? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@xirzec Sorry I missed the notification on this, I'll review now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
* If the request is terminated, an `AbortError` is thrown. | ||
* Defaults to 0, which disables the timeout. | ||
*/ | ||
timeout: number; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear, is this timeout meant to be inclusive of retries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this works today, it would be each retry would use this timeout to make the request. E.g. if you set it to 2s and the retry policy tried 3 times, you would wait for 6s before totally failing.
If this is expected/desirable, I can update the comment.
/cc @bterlson
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wrong about this -- retry policy doesn't retry abort errors today, which is how timeout is modeled as.
let keepAliveAgent: https.Agent; | ||
let proxyAgent: https.Agent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the right place to cache the agents? It seems like this would could be problematic if there were multiple service clients that needed to connect to different proxies, or even if the proxySettings needed to change so a client was recreated with the new settings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. In the existing implementation, these are cached on the class instance. I need to fix this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in the latest version
} | ||
|
||
function getRequestOptions(request: PipelineRequest): https.RequestOptions { | ||
const agent = getOrCreateAgent(request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning on eventually letting a user pass in their own agent? I'd love to be able to do that so I can configure my own certificate bundles (if needed), or socket pooling via maxSockets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems reasonable. Are you thinking we'd add a node-only requestoption for this or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes exactly, a node-only request option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm are we worried about having to type agent
in requestOptions? It seems awfully big
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well maybe not big, but it's a class rather than an interface
export class Agent {
maxFreeSockets: number;
maxSockets: number;
sockets: any;
requests: any;
constructor(opts?: AgentOptions);
/**
* Destroy any sockets that are currently in use by the agent.
* It is usually not necessary to do this. However, if you are using an agent with KeepAlive enabled,
* then it is best to explicitly shut down the agent when you know that it will no longer be used. Otherwise,
* sockets may hang open for quite a long time before the server terminates them.
*/
destroy(): void;
}
Is it worth creating a compatible interface to that or should we do something like unknown/any?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked about this briefly today, seems like we may be able to use placeholder types eventually, but for now this can be unknown/any
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to pass in our own HTTP agent or to specify max sockets today? I'm aiming to allow the use of both node-fetch and Azure Search SDK in an Azure Function to make requests to <mysearch>.search.windows.net
, but need to ensure I don't run into port exhaustion.
Sorry for dredging up an old topic, but this is the closest discussion I've found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mdmower please always feel free to open a new issue even if it's just to ask a question. We're happy to help and filing something helps to ensure that your feedback/inquiry won't get lost in the shuffle.
To answer your specific question, I think you could do this by creating a custom PipelinePolicy
that sets the agent
property on each PipelineRequest
made by the Search SDK when performing an operation. To have the SDK use your custom policy, you can pass it via the additionalPolicies
client option.
If you have trouble getting it to work, feel free to file an issue! 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (request.streamResponseBody) { | ||
response.readableStreamBody = responseStream; | ||
} else { | ||
response.bodyAsText = await streamToText(responseStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your thoughts on making the streamToText
part of a deserialization policy, so instead we always set the readableStreamBody
and have access to that?
I feel like this makes it easier for our deserializers to add more validation/transforms if needed.
For example, we could easily add a policy that checks that the number of bytes in the response body matches the content-length. Or have a policy that gunzips the response body as it is streamed from the service. We could even allow an option in the future to give customers direct access to the response in case they wanted to bypass all our magic altogether!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would love to go down this route, but how do we handle the browser case where the request is always unzipped before it's given to you, and you can't participate in the stream/deserialization process?
The reason this layer is so thick is because we don't want to expose transport differences between node and the browser to policies in the pipeline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already do expose transport-level differences though, in that in node.js, you can get access to a readableStream, and in the browser, you get can get access to a Blob.
We'd have to have some policies that only applied for node.js, and some that only applied for browsers. In practice I'd imagine that would mean that we'd create a policy for both runtimes, with one being a no-op if nothing had to be done for that runtime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'd like to think about this some more, but I'm definitely open to refactoring the clients into policies
addProgressListener(xhr.upload, request.onUploadProgress); | ||
addProgressListener(xhr, request.onDownloadProgress); | ||
|
||
if (request.formData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments from the node http client around handling formData in a policy applies here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is more do-able to me, though it'd be the first time we had a browser vs a node version of a policy that I am aware of.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do have a policy today that's different in browsers vs node:
https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/core/core-http/src/policies/disableResponseDecompressionPolicy.browser.ts
https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/core/core-http/src/policies/disableResponseDecompressionPolicy.ts
Though, if it really doesn't make sense to do it in the browser case, then I'm not sure if we should in the node case either (just so that we can keep the expectations on what an HttpClient expects to see the same across environments).
Edit: I'd love if our HttpClient only had to deal with request.body and not request.formData since that would make implementing other HttpClients easier (looking at you fetch or mock clients!), but I don't consider that a blocker!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm that is a tempting thing to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll look at pulling this out into a policy in my next PR that ports over policies!
Also clean up when we set content length.
if (!body) { | ||
return 0; | ||
} else if (typeof body === "string") { | ||
return body.length; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd thought the content-length is in bytes so this should be body.byteLength
for strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JS string length is always the byte length, isn't it? @bterlson as the master of encodings
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MDN says it's in UTF-16 code units. I don't know what that means. However we've seen issues before in storage upload when string length is different from number of bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very interesting issue, thanks for pinging me!
This should definitely return the number of bytes (octets) in the encoded body. Hopefully, this will usually be utf-8. Unfortunately, JS string length will give you UTF-16 code units (i.e. the number of 2-byte units comprising the string). These will only agree for BMP characters, and so I think this calculation is incorrect.
It seems a little strange to be doing anything other than accessing byteLength on the encoded buffer before it hits the wire.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK yeah, I looked at what node-fetch did to doublecheck this logic and we can't rely on string length. In those cases we should let node write it out instead of setting the length.
This PR represents the next chunk of work in fleshing out the new core-http library by implementing HttpsClient on top of XHR and node's built-in
https
module.This removes the previous dependency on node-fetch (#8483) and exchanges tunnel for node-https-proxy-agent (#6773).
Along the way I figured out a simpler way to test the request machinery without pulling in any massive mock packages. I ported what tests existed in the old library, though it would be easy to add some new additional tests for things like gzip support.