Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core-https] Implement HttpsClient interface for node and browser #9082

Merged
merged 26 commits into from
Jun 2, 2020

Conversation

xirzec
Copy link
Member

@xirzec xirzec commented May 22, 2020

This PR represents the next chunk of work in fleshing out the new core-http library by implementing HttpsClient on top of XHR and node's built-in https module.

This removes the previous dependency on node-fetch (#8483) and exchanges tunnel for node-https-proxy-agent (#6773).

Along the way I figured out a simpler way to test the request machinery without pulling in any massive mock packages. I ported what tests existed in the old library, though it would be easy to add some new additional tests for things like gzip support.

@xirzec xirzec added Client This issue points to a problem in the data-plane of the library. Azure.Core labels May 22, 2020
@xirzec xirzec self-assigned this May 22, 2020
}
}

if (!request.skipDecompressResponse) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to clear Accept-Encoding header for the else case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we could... but what would the scenario be? The user sets it as a manual header in the operation or another policy in the pipeline sets it?

Copy link
Member

@joheredi joheredi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff! I just have a couple minor comments

* Get the JSON object representation of this HTTP header collection.
*/
public toJson(): RawHttpHeaders {
return this.raw();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and raw() do the same thing right? Is this here for backwards compat or is the plan for toJson to do something else in the future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, this is silly. I'm dropping raw and keeping only toJSON.

I also realized that it needs to be toJSON rather than toJson in order to make JSON.stringify do the right thing. I fixed that too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it different to have toJson vs toJSON?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JSON.stringify looks for a toJSON method when serializing to decide how to encode objects: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify

}, request.timeout);
}

if (request.formData) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would make sense to extract these request preparation steps to their own functions? I think it may help readability and make it easier to focus on the core of the function which is sending the request

Copy link
Member Author

@xirzec xirzec May 26, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Pulled this out into its own function

try {
const result = await new Promise<PipelineResponse>((resolve, reject) => {
const agent = getOrCreateAgent(request);
const url = new URL(request.url);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same suggestion as above, some of these blocks could be extracted into their own functions to help readability. For example

const options: https.RequestOptions = getRequestOptions(request);
...
const headers = getResponseHeaders(res)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pulled these out too

@xirzec
Copy link
Member Author

xirzec commented May 27, 2020

@joheredi @jeremymeng Are there any changes that are blocking your approval?

@bterlson @chradek any feedback?

Copy link
Member

@jeremymeng jeremymeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@chradek
Copy link
Contributor

chradek commented May 27, 2020

@xirzec Sorry I missed the notification on this, I'll review now.

Copy link
Member

@joheredi joheredi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

sdk/core/core-https/review/core-https.api.md Show resolved Hide resolved
* If the request is terminated, an `AbortError` is thrown.
* Defaults to 0, which disables the timeout.
*/
timeout: number;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear, is this timeout meant to be inclusive of retries?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this works today, it would be each retry would use this timeout to make the request. E.g. if you set it to 2s and the retry policy tried 3 times, you would wait for 6s before totally failing.

If this is expected/desirable, I can update the comment.

/cc @bterlson

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wrong about this -- retry policy doesn't retry abort errors today, which is how timeout is modeled as.

sdk/core/core-https/src/interfaces.ts Show resolved Hide resolved
Comment on lines 23 to 24
let keepAliveAgent: https.Agent;
let proxyAgent: https.Agent;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the right place to cache the agents? It seems like this would could be problematic if there were multiple service clients that needed to connect to different proxies, or even if the proxySettings needed to change so a client was recreated with the new settings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're right. In the existing implementation, these are cached on the class instance. I need to fix this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in the latest version

}

function getRequestOptions(request: PipelineRequest): https.RequestOptions {
const agent = getOrCreateAgent(request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning on eventually letting a user pass in their own agent? I'd love to be able to do that so I can configure my own certificate bundles (if needed), or socket pooling via maxSockets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems reasonable. Are you thinking we'd add a node-only requestoption for this or something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, a node-only request option.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm are we worried about having to type agent in requestOptions? It seems awfully big

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well maybe not big, but it's a class rather than an interface

export class Agent {
        maxFreeSockets: number;
        maxSockets: number;
        sockets: any;
        requests: any;

        constructor(opts?: AgentOptions);

        /**
         * Destroy any sockets that are currently in use by the agent.
         * It is usually not necessary to do this. However, if you are using an agent with KeepAlive enabled,
         * then it is best to explicitly shut down the agent when you know that it will no longer be used. Otherwise,
         * sockets may hang open for quite a long time before the server terminates them.
         */
        destroy(): void;
    }

Is it worth creating a compatible interface to that or should we do something like unknown/any?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this briefly today, seems like we may be able to use placeholder types eventually, but for now this can be unknown/any

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to pass in our own HTTP agent or to specify max sockets today? I'm aiming to allow the use of both node-fetch and Azure Search SDK in an Azure Function to make requests to <mysearch>.search.windows.net, but need to ensure I don't run into port exhaustion.

Sorry for dredging up an old topic, but this is the closest discussion I've found.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdmower please always feel free to open a new issue even if it's just to ask a question. We're happy to help and filing something helps to ensure that your feedback/inquiry won't get lost in the shuffle.

To answer your specific question, I think you could do this by creating a custom PipelinePolicy that sets the agent property on each PipelineRequest made by the Search SDK when performing an operation. To have the SDK use your custom policy, you can pass it via the additionalPolicies client option.

If you have trouble getting it to work, feel free to file an issue! 😄

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xirzec, I really appreciate the reply. I haven't figured it out yet, so I posted a new issue.

sdk/core/core-https/src/nodeHttpsClient.ts Show resolved Hide resolved
if (request.streamResponseBody) {
response.readableStreamBody = responseStream;
} else {
response.bodyAsText = await streamToText(responseStream);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your thoughts on making the streamToText part of a deserialization policy, so instead we always set the readableStreamBody and have access to that?

I feel like this makes it easier for our deserializers to add more validation/transforms if needed.
For example, we could easily add a policy that checks that the number of bytes in the response body matches the content-length. Or have a policy that gunzips the response body as it is streamed from the service. We could even allow an option in the future to give customers direct access to the response in case they wanted to bypass all our magic altogether!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to go down this route, but how do we handle the browser case where the request is always unzipped before it's given to you, and you can't participate in the stream/deserialization process?

The reason this layer is so thick is because we don't want to expose transport differences between node and the browser to policies in the pipeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already do expose transport-level differences though, in that in node.js, you can get access to a readableStream, and in the browser, you get can get access to a Blob.

We'd have to have some policies that only applied for node.js, and some that only applied for browsers. In practice I'd imagine that would mean that we'd create a policy for both runtimes, with one being a no-op if nothing had to be done for that runtime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'd like to think about this some more, but I'm definitely open to refactoring the clients into policies

sdk/core/core-https/src/restError.ts Show resolved Hide resolved
addProgressListener(xhr.upload, request.onUploadProgress);
addProgressListener(xhr, request.onDownloadProgress);

if (request.formData) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments from the node http client around handling formData in a policy applies here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is more do-able to me, though it'd be the first time we had a browser vs a node version of a policy that I am aware of.

Copy link
Contributor

@chradek chradek May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do have a policy today that's different in browsers vs node:
https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/core/core-http/src/policies/disableResponseDecompressionPolicy.browser.ts
https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/core/core-http/src/policies/disableResponseDecompressionPolicy.ts

Though, if it really doesn't make sense to do it in the browser case, then I'm not sure if we should in the node case either (just so that we can keep the expectations on what an HttpClient expects to see the same across environments).

Edit: I'd love if our HttpClient only had to deal with request.body and not request.formData since that would make implementing other HttpClients easier (looking at you fetch or mock clients!), but I don't consider that a blocker!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm that is a tempting thing to do

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll look at pulling this out into a policy in my next PR that ports over policies!

if (!body) {
return 0;
} else if (typeof body === "string") {
return body.length;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd thought the content-length is in bytes so this should be body.byteLength for strings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JS string length is always the byte length, isn't it? @bterlson as the master of encodings

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MDN says it's in UTF-16 code units. I don't know what that means. However we've seen issues before in storage upload when string length is different from number of bytes.

Copy link
Member

@bterlson bterlson May 28, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very interesting issue, thanks for pinging me!

This should definitely return the number of bytes (octets) in the encoded body. Hopefully, this will usually be utf-8. Unfortunately, JS string length will give you UTF-16 code units (i.e. the number of 2-byte units comprising the string). These will only agree for BMP characters, and so I think this calculation is incorrect.

It seems a little strange to be doing anything other than accessing byteLength on the encoded buffer before it hits the wire.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah, I looked at what node-fetch did to doublecheck this logic and we can't rely on string length. In those cases we should let node write it out instead of setting the length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Azure.Core Client This issue points to a problem in the data-plane of the library.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants