Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDS pipethrough optimizations #2770

Merged
merged 47 commits into from
Sep 19, 2024
Merged

PDS pipethrough optimizations #2770

merged 47 commits into from
Sep 19, 2024

Conversation

matthieusieben
Copy link
Contributor

No description provided.

@matthieusieben matthieusieben force-pushed the msieben/micro-optimizations branch 4 times, most recently from 7d9350c to f629fc7 Compare September 2, 2024 18:39
@matthieusieben matthieusieben changed the title Micro optimizations in PDS PDS pipethrough optimizations Sep 6, 2024
@matthieusieben
Copy link
Contributor Author

matthieusieben commented Sep 6, 2024

Still need to make sure tests pass and add some more

@@ -43,15 +43,6 @@ export function ssrfFetchWrap<C = FetchContext>({
}

if (url.protocol === 'http:' || url.protocol === 'https:') {
// @ts-expect-error non-standard option
if (request.dispatcher) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dispatcher was never exposed on Request


if (allowUnknownTld !== true && !isValidDomain(url.hostname)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be performed through the lookup fn

}
return new Uint8Array(Buffer.concat(bufs))
// Note Buffer is a subclass of Uint8Array
return Buffer.concat(chunks, totalLength)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason for changing the return type here?

i've been bit by subtle differences between Buffers & Uint8Arrays before & kinda prefer using just one or the other in most of our stuff is possible. or at least differentiating between the two and saying "bytes" for Uint8Arrays and "buffer" for Buffers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffer is a sub-class of Uint8Array. So it kinda makes no sense to wrap a Buffer within an Uint8Array since we loose functionalities & we create two objects instead of one.
Since this function is only used in the pipethrough code, I will rename the fn.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that common-web already has a streamToBuffer (that actually performs a AsyncIterable<Uint8Array> to Uint8Array). I think it's best if we don't change that function's name so I named this one to streamToNodeBuffer ...

})
}
if (isHandlerPipeThroughBuffer(outputUnvalidated)) {
setHeaders(res, outputUnvalidated)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean

}
if (!Object.values(v['headers']).every(isString)) {
return false
const createValidator =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we avoided zod here because it can be kinda heavy duty for validation of stuff that is often invalid. I know it used to handle this by throwing and catching errors which can be rough on Node so we tried to keep it out of the happy/critical path of our server stuff.

But I also have some vague memory that they fixed this & made it faster? Any idea? Or any idea how these compare?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, validating here is way overkill. Since we know the input is of type HandlerOutput, all we need is to discriminate based on that. We can even make some of the checks we currently do disappear !

const createValidator =
<T>(schema: ZodSchema<T>) =>
(v: unknown): v is T =>
schema.safeParse(v).success
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a helper function for this that we use in most places in common-web/check

check.is(v, schema)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a check.create(schema) as the semantics I needed are a bit different

return nsid.endsWith('/') ? nsid.slice(0, -1) : nsid // trim trailing slash
// /!\ Hot path

const originalUrl = ('originalUrl' in req && req.originalUrl) || req.url
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we rewrite as ternary? may be a bit easier to read on first glance

@@ -23,42 +30,52 @@ export const getLocalLag = (local: LocalRecords): number | undefined => {
return Date.now() - new Date(oldest).getTime()
}

export const handleReadAfterWrite = async <T>(
export const pipethroughReadAfterWrite = async <T>(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dig this fn a lot

const safeAgent = new undici.Agent({
allowH2: cfg.fetch.allowHTTP2, // This is experimental
headersTimeout: cfg.fetch.headersTimeout,
maxResponseSize: cfg.fetch.maxResponseSize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we cool re-using these config values for both the safeFetch & safeAgent?

only issue I could think of is if we have different tradeoffs on maxResponseSize

@@ -256,12 +263,36 @@ export class AppContext {
appviewCdnUrlPattern: cfg.bskyAppView?.cdnUrlPattern,
})

const safeAgent = new undici.Agent({
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost want to call this proxyAgent or safeProxyAgent or something similar to give the sense of what it's for since it's only use there

reqInit: RequestInit,
): Promise<Response> => {
let res: Response
type Accept = [name: string, flags: { q: string }]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thoughts on

type Accept = {
  name: string
  q: string
}

instead of the array syntax? seems more clear & type-safe

Copy link
Contributor Author

@matthieusieben matthieusieben Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea of this code is to be agnostic of the accept-* header being negotiated.

Consider for example:

Accept-Patch: text/example;charset=utf-8, text/example

In that case, the Accept type could be generalized as such:

type Accept = [name: string, flags: Record<string, string>]

When there is an obvious case for generalization, I like to write generic code. I guess this boils down to coding style preferences.

I can make this code more straight forward if you prefer or if you think this is too much YAGNI ?

What type safety difference are you thinking of ?

Copy link
Collaborator

@dholms dholms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left some small tweaks/questions, but broadstrokes this looks awesome 💪 super stoked to get this in

@dholms dholms merged commit a07b211 into main Sep 19, 2024
11 checks passed
@dholms dholms deleted the msieben/micro-optimizations branch September 19, 2024 23:24
@github-actions github-actions bot mentioned this pull request Sep 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants