[core-https] Implement `HttpsClient` interface for node and browser #9082

xirzec · 2020-05-22T02:07:23Z

This PR represents the next chunk of work in fleshing out the new core-http library by implementing HttpsClient on top of XHR and node's built-in https module.

This removes the previous dependency on node-fetch (#8483) and exchanges tunnel for node-https-proxy-agent (#6773).

Along the way I figured out a simpler way to test the request machinery without pulling in any massive mock packages. I ported what tests existed in the old library, though it would be easy to add some new additional tests for things like gzip support.

Everything builds clean for the moment.

Respect the skipDecompressResponse flag and support unzip on streams.

jeremymeng · 2020-05-22T18:45:53Z

sdk/core/core-https/src/nodeHttpsClient.ts

+      }
+    }
+
+    if (!request.skipDecompressResponse) {


Do we want to clear Accept-Encoding header for the else case?

I suppose we could... but what would the scenario be? The user sets it as a manual header in the operation or another policy in the pipeline sets it?

sdk/core/core-https/src/xhrHttpsClient.ts

joheredi

Great stuff! I just have a couple minor comments

joheredi · 2020-05-24T01:48:33Z

sdk/core/core-https/src/httpHeaders.ts

+   * Get the JSON object representation of this HTTP header collection.
+   */
+  public toJson(): RawHttpHeaders {
+    return this.raw();


This and raw() do the same thing right? Is this here for backwards compat or is the plan for toJson to do something else in the future?

You're right, this is silly. I'm dropping raw and keeping only toJSON.

I also realized that it needs to be toJSON rather than toJson in order to make JSON.stringify do the right thing. I fixed that too.

Why is it different to have toJson vs toJSON?

JSON.stringify looks for a toJSON method when serializing to decide how to encode objects: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/JSON/stringify

joheredi · 2020-05-24T02:03:11Z

sdk/core/core-https/src/nodeHttpsClient.ts

+      }, request.timeout);
+    }
+
+    if (request.formData) {


Do you think it would make sense to extract these request preparation steps to their own functions? I think it may help readability and make it easier to focus on the core of the function which is sending the request

Sure. Pulled this out into its own function

joheredi · 2020-05-24T02:09:13Z

sdk/core/core-https/src/nodeHttpsClient.ts

+    try {
+      const result = await new Promise<PipelineResponse>((resolve, reject) => {
+        const agent = getOrCreateAgent(request);
+        const url = new URL(request.url);


Same suggestion as above, some of these blocks could be extracted into their own functions to help readability. For example

const options: https.RequestOptions = getRequestOptions(request); ... const headers = getResponseHeaders(res)

pulled these out too

xirzec · 2020-05-27T20:50:37Z

@joheredi @jeremymeng Are there any changes that are blocking your approval?

@bterlson @chradek any feedback?

jeremymeng

LGTM

chradek · 2020-05-27T20:56:50Z

@xirzec Sorry I missed the notification on this, I'll review now.

joheredi

Looks good!

sdk/core/core-https/review/core-https.api.md

chradek · 2020-05-27T21:19:48Z

sdk/core/core-https/src/interfaces.ts

+   * If the request is terminated, an `AbortError` is thrown.
+   * Defaults to 0, which disables the timeout.
+   */
+  timeout: number;


Just to be clear, is this timeout meant to be inclusive of retries?

The way this works today, it would be each retry would use this timeout to make the request. E.g. if you set it to 2s and the retry policy tried 3 times, you would wait for 6s before totally failing.

If this is expected/desirable, I can update the comment.

/cc @bterlson

I was wrong about this -- retry policy doesn't retry abort errors today, which is how timeout is modeled as.

sdk/core/core-https/src/interfaces.ts

chradek · 2020-05-27T21:30:30Z

sdk/core/core-https/src/nodeHttpsClient.ts

+let keepAliveAgent: https.Agent;
+let proxyAgent: https.Agent;


Is this the right place to cache the agents? It seems like this would could be problematic if there were multiple service clients that needed to connect to different proxies, or even if the proxySettings needed to change so a client was recreated with the new settings.

Ah, you're right. In the existing implementation, these are cached on the class instance. I need to fix this!

Fixed in the latest version

chradek · 2020-05-27T21:32:57Z

sdk/core/core-https/src/nodeHttpsClient.ts

+}
+
+function getRequestOptions(request: PipelineRequest): https.RequestOptions {
+  const agent = getOrCreateAgent(request);


Are you planning on eventually letting a user pass in their own agent? I'd love to be able to do that so I can configure my own certificate bundles (if needed), or socket pooling via maxSockets.

That seems reasonable. Are you thinking we'd add a node-only requestoption for this or something else?

Yes exactly, a node-only request option.

Hmm are we worried about having to type agent in requestOptions? It seems awfully big

well maybe not big, but it's a class rather than an interface

export class Agent { maxFreeSockets: number; maxSockets: number; sockets: any; requests: any; constructor(opts?: AgentOptions); /** * Destroy any sockets that are currently in use by the agent. * It is usually not necessary to do this. However, if you are using an agent with KeepAlive enabled, * then it is best to explicitly shut down the agent when you know that it will no longer be used. Otherwise, * sockets may hang open for quite a long time before the server terminates them. */ destroy(): void; }

Is it worth creating a compatible interface to that or should we do something like unknown/any?

We talked about this briefly today, seems like we may be able to use placeholder types eventually, but for now this can be unknown/any

Is there a way to pass in our own HTTP agent or to specify max sockets today? I'm aiming to allow the use of both node-fetch and Azure Search SDK in an Azure Function to make requests to <mysearch>.search.windows.net, but need to ensure I don't run into port exhaustion.

Sorry for dredging up an old topic, but this is the closest discussion I've found.

@mdmower please always feel free to open a new issue even if it's just to ask a question. We're happy to help and filing something helps to ensure that your feedback/inquiry won't get lost in the shuffle.

To answer your specific question, I think you could do this by creating a custom PipelinePolicy that sets the agent property on each PipelineRequest made by the Search SDK when performing an operation. To have the SDK use your custom policy, you can pass it via the additionalPolicies client option.

If you have trouble getting it to work, feel free to file an issue! 😄

Thanks @xirzec, I really appreciate the reply. I haven't figured it out yet, so I posted a new issue.

sdk/core/core-https/src/nodeHttpsClient.ts

chradek · 2020-05-27T22:13:23Z

sdk/core/core-https/src/nodeHttpsClient.ts

+          if (request.streamResponseBody) {
+            response.readableStreamBody = responseStream;
+          } else {
+            response.bodyAsText = await streamToText(responseStream);


What are your thoughts on making the streamToText part of a deserialization policy, so instead we always set the readableStreamBody and have access to that?

I feel like this makes it easier for our deserializers to add more validation/transforms if needed.
For example, we could easily add a policy that checks that the number of bytes in the response body matches the content-length. Or have a policy that gunzips the response body as it is streamed from the service. We could even allow an option in the future to give customers direct access to the response in case they wanted to bypass all our magic altogether!

I would love to go down this route, but how do we handle the browser case where the request is always unzipped before it's given to you, and you can't participate in the stream/deserialization process?

The reason this layer is so thick is because we don't want to expose transport differences between node and the browser to policies in the pipeline.

We already do expose transport-level differences though, in that in node.js, you can get access to a readableStream, and in the browser, you get can get access to a Blob.

We'd have to have some policies that only applied for node.js, and some that only applied for browsers. In practice I'd imagine that would mean that we'd create a policy for both runtimes, with one being a no-op if nothing had to be done for that runtime.

Hmm I'd like to think about this some more, but I'm definitely open to refactoring the clients into policies

sdk/core/core-https/src/restError.ts

sdk/core/core-https/src/util/inspect.browser.ts

chradek · 2020-05-27T22:29:10Z

sdk/core/core-https/src/xhrHttpsClient.ts

+    addProgressListener(xhr.upload, request.onUploadProgress);
+    addProgressListener(xhr, request.onDownloadProgress);
+
+    if (request.formData) {


Same comments from the node http client around handling formData in a policy applies here.

This one is more do-able to me, though it'd be the first time we had a browser vs a node version of a policy that I am aware of.

We do have a policy today that's different in browsers vs node:
https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/core/core-http/src/policies/disableResponseDecompressionPolicy.browser.ts
https://github.com/Azure/azure-sdk-for-js/blob/master/sdk/core/core-http/src/policies/disableResponseDecompressionPolicy.ts

Though, if it really doesn't make sense to do it in the browser case, then I'm not sure if we should in the node case either (just so that we can keep the expectations on what an HttpClient expects to see the same across environments).

Edit: I'd love if our HttpClient only had to deal with request.body and not request.formData since that would make implementing other HttpClients easier (looking at you fetch or mock clients!), but I don't consider that a blocker!

hmm that is a tempting thing to do

I think I'll look at pulling this out into a policy in my next PR that ports over policies!

Also clean up when we set content length.

jeremymeng · 2020-05-28T16:07:55Z

sdk/core/core-https/src/nodeHttpsClient.ts

+  if (!body) {
+    return 0;
+  } else if (typeof body === "string") {
+    return body.length;


I'd thought the content-length is in bytes so this should be body.byteLength for strings?

JS string length is always the byte length, isn't it? @bterlson as the master of encodings

MDN says it's in UTF-16 code units. I don't know what that means. However we've seen issues before in storage upload when string length is different from number of bytes.

This is a very interesting issue, thanks for pinging me!

This should definitely return the number of bytes (octets) in the encoded body. Hopefully, this will usually be utf-8. Unfortunately, JS string length will give you UTF-16 code units (i.e. the number of 2-byte units comprising the string). These will only agree for BMP characters, and so I think this calculation is incorrect.

It seems a little strange to be doing anything other than accessing byteLength on the encoded buffer before it hits the wire.

OK yeah, I looked at what node-fetch did to doublecheck this logic and we can't rely on string length. In those cases we should let node write it out instead of setting the length.

xirzec added 12 commits May 12, 2020 15:29

Expose pipeline creation

9af9f6d

Flesh out request and port over xhr client

c61fd3d

Node client implemented

2aedff6

Everything builds clean for the moment.

Merge remote-tracking branch 'upstream/master' into requestPipeline

1236c21

Fix unzip support.

d50b36c

Respect the skipDecompressResponse flag and support unzip on streams.

Add first node https test

da1e022

Add some node httpsclient tests

fa28b66

Add xhr tests

171783f

Add timeout tests

c3fbabe

Cleanup imports

4f15bc1

Clean up doc comments

9c3b809

Merge remote-tracking branch 'upstream/master' into requestPipeline

69c387e

xirzec added Client This issue points to a problem in the data-plane of the library. Azure.Core labels May 22, 2020

xirzec requested review from bterlson, daviwil, joheredi and chradek May 22, 2020 02:07

xirzec requested review from jeremymeng and ramya-rao-a as code owners May 22, 2020 02:07

xirzec self-assigned this May 22, 2020

jeremymeng reviewed May 22, 2020

View reviewed changes

sdk/core/core-https/src/xhrHttpsClient.ts Show resolved Hide resolved

xirzec added 3 commits May 26, 2020 15:03

Merge remote-tracking branch 'upstream/master' into requestPipeline

790a83a

Fix versions to align to common versions

5801962

Add some RestError tests and fix serialization

e2a855c

joheredi reviewed May 26, 2020

View reviewed changes

xirzec added 3 commits May 26, 2020 16:06

Add some more xhr tests

198e443

Fix headers toJSON method

9f57a7a

Extract form data logic

c37c66a

xirzec added 2 commits May 26, 2020 16:44

Review feedback

1704c6d

Appease the build checks

32a5b5f

jeremymeng approved these changes May 27, 2020

View reviewed changes

joheredi approved these changes May 27, 2020

View reviewed changes

chradek reviewed May 27, 2020

View reviewed changes

xirzec added 2 commits May 27, 2020 16:54

Move agent caching onto the client.

b97ec77

Also clean up when we set content length.

Detect length before setting upload progress transform

97abb7a

jeremymeng reviewed May 28, 2020

View reviewed changes

xirzec added 4 commits May 28, 2020 13:31

Rework test scripts for a faster loop

fe72525

Handle progress errors correctly

97b6d9e

Don't guess at string size for requests

7ea0dc6

Merge remote-tracking branch 'upstream/master' into requestPipeline

f24fb41

xirzec merged commit 00dad89 into Azure:master Jun 2, 2020

xirzec deleted the requestPipeline branch June 2, 2020 00:02

mikeharder mentioned this pull request Jun 2, 2020

[EngSys] Builds not failing if dependencies changed but no "rush update" #9243

Closed

xirzec mentioned this pull request Jun 18, 2020

[core-http] Remove custom URL implementation #8620

Closed

mdmower mentioned this pull request Apr 11, 2023

[Question] [core-http/search-sdk] How can I specify max sockets or provide my own HTTP agent to an Azure SearchClient? #25522

Closed

		let keepAliveAgent: https.Agent;
		let proxyAgent: https.Agent;

[core-https] Implement HttpsClient interface for node and browser #9082

[core-https] Implement HttpsClient interface for node and browser #9082

Conversation

xirzec commented May 22, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joheredi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xirzec May 26, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xirzec commented May 27, 2020

jeremymeng left a comment

Choose a reason for hiding this comment

chradek commented May 27, 2020

joheredi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chradek May 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bterlson May 28, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

[core-https] Implement `HttpsClient` interface for node and browser #9082

[core-https] Implement `HttpsClient` interface for node and browser #9082

xirzec May 26, 2020 •

edited

Loading

chradek May 28, 2020 •

edited

Loading

bterlson May 28, 2020 •

edited

Loading