Refactor to remove the need for a PassThrough stream #15

mcollina · 2023-01-13T14:31:29Z

in fastify-static, we need a PassThrough object to convert Send to an actual Writable:

https://github.com/fastify/fastify-static/blob/master/index.js#L113-L140

I think this module should be refactored to in combination with fastify-static to remove the need for the wrapper completely.

The text was updated successfully, but these errors were encountered:

Uzlopak · 2023-01-14T12:39:34Z

I have no clue what is needed to do it.

mcollina · 2023-01-14T19:38:50Z

Currently Send() is a stream-like object that don't really quack like a Stream.

It's also a "fake" synchronous API. There are two sets of asynchronous ops:

gathet the stats of the file
start the stream

Both of then happen after a call to pipe().

Essentially we need to refactor it so we split them. Get the metadata and then read the stream only if it's needed (in case or conditional gets).

climba03003 · 2023-01-18T09:01:45Z

@mcollina Just some question before start.

Do you means Duplex instead of Writable? It should be a middle stream between fs.createReadStream and OutgoingMessage.

I assume it is what we would like to achieve?

import http from 'http'
import SendStream from '@fastify/send'
import { pipeline } from 'stream'

const app = http.createServer(function (req, res) {
  // the options include the duplex options
  const stream = new SendStream(req, req.url, { root: './' })
  pipeline(stream, res, function(err) {
    console.log(err)
  })
})

Does it means that we need to inspect on pipe and unpipe event?

graph TD
    A[SendStream] --> |created| B{Event Emitter}
    B --> |ready| C[Waiting for Pipe]
    B --> |pipe| D[Waiting for Ready]
    B --> |unpipe| E{Check Status}
    B --> |error| G[Cleanup and Destory]
    C --> |is pipe| F[Create File Readable]
    D --> |is ready| F
    E --> |flowing| G[Cleanup and Destory]
    E --> |finished| G
    E --> |never flow| C
    F --> |.pipe this| H[Streaming]
    H --> |end| G

mcollina · 2023-01-18T11:08:08Z

No, not really.

I think we need something like:

import http from 'http'
import send from '@fastify/send'
import { pipeline } from 'stream'

const app = http.createServer(async function (req, res) {
  // the options include the duplex options
  const { statusCode, headers, stream } = await send(req, req.url, { root: './' })
  res.writeHead(statusCode, headers)
  if (stream) {
    pipeline(stream, res, function(err) {
      console.log(err)
    })
  } else {
    res.end()
  }
})

It might be callback-based instead of promise-based.

serzero2007 · 2023-07-30T11:49:42Z

No, not really.

I think we need something like:

import http from 'http'
import send from '@fastify/send'
import { pipeline } from 'stream'

const app = http.createServer(async function (req, res) {
  // the options include the duplex options
  const { statusCode, headers, stream } = await send(req, req.url, { root: './' })
  res.writeHead(statusCode, headers)
  if (stream) {
    pipeline(stream, res, function(err) {
      console.log(err)
    })
  } else {
    res.end()
  }
})

It might be callback-based instead of promise-based.

Please check if you are interested in my proof of concept implementation of proposed API here:
https://gist.github.com/serzero2007/c6c48b4c90c4318edcc17b458cf876bd

All imported functions are from this repository. The code is just a draft rewrite of existing code to use new API and not tested at all. Index support is not done yet and also it's not clear how to re-implement events in the new API.

Looking forward to your feedback.

mcollina · 2023-07-30T14:50:49Z

This is amazing! Would you mind to send a PR?

serzero2007 · 2023-07-30T15:59:35Z

Well, it will take some time to make PR. It took 2 hours to get POC, but PR will take days.

There are questions we need to discuss first:

The API is changed dramatically. I think it's better to create a separate package with a new API, rather than breaking compatibility with previous API or bundling both APIs in the same package. What do you think?
Events API. We need somehow bring the functionality of it to a new API. Any suggestions?

Uzlopak · 2023-07-30T16:29:24Z

We use fastify/send for our fastify/static module. And only we are currently using this package. So you are imho free to break the api, as long it works flawless in fastify/static. We would also need to bench the performance. I dont know ad hoc, if we need the events api. Again, it has to work flawless in fastify/static.

serzero2007 · 2023-07-30T18:06:28Z

Well, it obviously will require changes on the side of fastify/static, because API will have changed. For beginning, I will just drop event api. I suppose it is not a big deal to add it afterwards.

mcollina · 2023-07-31T07:51:41Z

This will give significant performance improvement to fastify-static, we are definitely happy to make the relevant the changes in that module.

We are not using the events API, so no problems there. We do support the index capability, however I think we could move it to @fastify/static or even drop that altogether.

serzero2007 · 2023-07-31T11:31:14Z

You may track progress here:
https://github.com/serzero2007/send/blob/proofofconceptnewapi/lib/parsePath.js
https://github.com/serzero2007/send/blob/proofofconceptnewapi/lib/parseOptions.js
https://github.com/serzero2007/send/blob/proofofconceptnewapi/lib/send.js
https://github.com/serzero2007/send/blob/proofofconceptnewapi/lib/getStatAndPath.js
https://github.com/serzero2007/send/blob/proofofconceptnewapi/lib/conditionalGet.js
(I will edit this comment to include more files as they appear)

I am not experienced with writing tests in tap, so probably I will need some help here. For example, here:
https://github.com/serzero2007/send/blob/proofofconceptnewapi/test/send.test.js

serzero2007 · 2023-07-31T20:48:58Z

I have a question. Why here

const etagL = etag.length
const isMatching = parseTokenList(ifNoneMatch, function (match) {
      const mL = match.length

      if (
        (etagL === mL && match === etag) ||
            (etagL > mL && 'W/' + match === etag)
      ) {
        return true
      }
})

do we compare the lengths of strings at all? Could we just compare strings itself? Is it a performance (micro)optimisation? Is it worth it after all? I am going to drop it and just compare strings:

function isIfNoneMatchFailure (req, context) {
    const ifNoneMatch = req.headers['if-none-match']

    if (ifNoneMatch === '*') return true
    if (typeof context.etag !== 'string') return false
    
    function checker(match) {
        if (match === context.etag) return true
        if ('W/' + match === context.etag) return true
        return undefined
    }

    const isMatching = parseTokenList(ifNoneMatch, checker, false)
    return isMatching
}

Uzlopak · 2023-07-31T21:05:42Z

Yes the length comparison is a perf optimization. It is worth.

serzero2007 · 2023-07-31T21:10:57Z

It's strange because it was only in one of two places in the code where this kind of things happen. Well, then I will do it in both places:

function isIfMatchPreconditionFailure (req, context) {
  const ifMatch = req.headers['if-match']
  if (!ifMatch || ifMatch === '*') return false

  const etag = context.etag
  const etagL = etag.length
  function checker (match) {
    const mL = match.length
    if (etagL === mL && match === etag) return true
    if (etagL > mL && 'W/' + match === etag) return true
    return undefined
  }

  const isMatching = parseTokenList(ifMatch, checker, false)

  return !isMatching
}

serzero2007@2cf019f

serzero2007 · 2023-07-31T21:18:35Z

When we return 304, we should

res.removeHeader('Content-Encoding')
res.removeHeader('Content-Language')
res.removeHeader('Content-Length')
res.removeHeader('Content-Range')
res.removeHeader('Content-Type')

but in this API we only return header object, that should be then applied manually. How should I designate in that object, that headers are to be removed? I can set them to undefined or whatever, but then people may accidentally place undefined into setHeader like this res.setHeader('Content-Type', undefined) so undefined will be converted to string and the header will have been set to undefined instead of being removed. Any thoughts?

@mcollina

Uzlopak · 2023-07-31T21:21:47Z

Yeah, when i think about it, we could do instead of etagL > mL this etagL - mL === 2

serzero2007 · 2023-07-31T21:33:12Z

@Uzlopak serzero2007@7ec36db

Still working:

test('if-none-match', async function (t) {
  const result1 = await send({ headers: {} }, '/name.txt', { root: fixtures })
  t.strictSame(result1.status, 200)

  const result2a = await send({ headers: { 'if-none-match': result1.headers.ETag } }, '/name.txt', { root: fixtures })
  t.strictSame(result2a.status, 304)

  const result2b = await send({ headers: { 'if-none-match': result1.headers.ETag.slice(2) } }, '/name.txt', { root: fixtures })
  t.strictSame(result2b.status, 304)

  const result3 = await send({ headers: { 'if-none-match': result1.headers.ETag + 'corrupt' } }, '/name.txt', { root: fixtures })
  t.strictSame(result3.status, 200)
})

By the way, why this perf opt is important? Doesn't V8 itself compare strings the smart way by first comparing their lengths?

serzero2007 · 2023-07-31T21:48:22Z

@Uzlopak

const test_string1 = Array.from(10000, i => `${i}`).join()
const test_string2 = Array.from(10000, i => `${i+1}`).join()

let k;
console.time("isLengthEqual");
for (let i = 0; i < 10000000; i++) {
    const isLengthEqual = test_string1.length === test_string2.length
    k = isLengthEqual
}
console.timeEnd("isLengthEqual");

let m;
console.time("isEqual");
for (let i = 0; i < 10000000; i++) {
    const isEqual = test_string1 === test_string2
    m = isEqual
}
console.timeEnd("isEqual");

Result:

isLengthEqual: 7.506ms
isEqual: 6.395ms

k and m are here to make sure V8 will not optimize everything out and skip comparing altogether.

At least it works in firefox:

isLengthEqual: 1294ms - timer ended        debugger eval code:9:9
isEqual: 1371ms - timer ended              debugger eval code:16:9

The same question: is it worth it or am I doing something wrong?

Uzlopak · 2023-07-31T22:25:57Z

Well. v8 is smart. I kind of doubt that the length check is made by v8.

Anyhow. in the checker function the length check makes alot of sense, because the string comparison is usually more expensive than the length comparison. And in the checker we have two times string comparisons.

I personally would not think about perf improvements on this detailed level. First rewrite it so that the tests pass (if possible without modifying the tests much). Then we can improve the perf and benchmark it.

serzero2007 · 2023-07-31T23:06:47Z

@Uzlopak Tests are to be modified to make use of a new API.

An example of test case:

test('if-unmodified-since', async function (t) {
  const result1 = await send({ headers: {} }, '/name.txt', { root: fixtures })

  const lmod = new Date(result1.headers['last-modified'])
  const date = new Date(lmod - 60000).toUTCString()

  const result2 = await send({ headers: { 'if-unmodified-since': date } }, '/name.txt', { root: fixtures })
  t.strictSame(result2.status, 412)

  // TODO: Is it correct?
  const result3 = await send({ headers: { 'if-unmodified-since': "corrupted" } }, '/name.txt', { root: fixtures })
  t.strictSame(result3.status, 200)

  const content1 = await streamToString2(result1.stream)
  const content3 = await streamToString2(result3.stream)

  t.strictSame(content1, "tobi")
  t.strictSame(content3, "tobi")
})

test('extentions', async function (t) {
  const result1 = await send({ headers: {} }, '/name', { root: fixtures, extensions: "txt" })
  t.strictSame(result1.status, 200)

  const result2 = await send({ headers: {} }, '/name', { root: fixtures, extensions: ["dir", "txt", "html"] })
  t.strictSame(result2.status, 200)

  const result3 = await send({ headers: {} }, '/name', { root: fixtures, extensions: ["html"] })
  t.strictSame(result3.status, 200)

  const content1 = await streamToString2(result1.stream)
  const content2 = await streamToString2(result2.stream)
  const content3 = await streamToString2(result3.stream)

  t.strictSame(content1, "tobi")
  t.strictSame(content2, "tobi")
  t.strictSame(content3, "<p>tobi</p>")

  const result4 = await send({ headers: {} }, '/name/', { root: fixtures, extensions: ["dir", "txt", "html"] })
  t.strictSame(result4.status, 404)

  const result5 = await send({ headers: {} }, '/name.html/', { root: fixtures, extensions: ["dir", "txt", "html"] })
  t.strictSame(result5.status, 404)
})

68bf54c

I will try to adapt all tests from the original code, but you will need to double check that everything is ok.

serzero2007 · 2023-08-01T00:06:32Z

For now I wrote my own tests to get full test coverage. Check it here:
serzero2007@7e11d95

Uzlopak · 2023-08-01T21:33:35Z

You want to provide a PR so that we can review?

serzero2007 · 2023-08-01T21:55:14Z

It's a work in progress now. You can review current state at https://github.com/serzero2007/send/tree/proofofconceptnewapi
I can create PR, but I see no sense in it for now. Do you want me to create PR with current code?

mcollina · 2023-11-30T16:37:44Z

@serzero2007 I only saw this message now. Of course, create a PR!

gurgunday · 2023-11-30T17:06:22Z

Wow, somehow I missed it too

It looks interesting

Uzlopak changed the title ~~Refactor to the move the need for a PassThrought object~~ Refactor to remove the need for a PassThrought object Jan 14, 2023

Uzlopak changed the title ~~Refactor to remove the need for a PassThrought object~~ Refactor to remove the need for a PassThrough stream Jan 14, 2023

Uzlopak mentioned this issue Jan 15, 2023

Make this module up-to-date and to the Fastify standard #2

Closed

serzero2007 mentioned this issue Aug 1, 2023

Improve Etag performance by caching fs.stats #36

Open

2 tasks

mcollina mentioned this issue Feb 2, 2024

abstract the plugin to provide support of virtual file system fastify/fastify-static#432

Open

2 tasks

gurgunday mentioned this issue May 3, 2024

Fix for files with % in filename fastify/fastify-static#452

Merged

4 tasks

climba03003 mentioned this issue Jul 6, 2024

refactor!: new implementation #77

Merged

4 tasks

mcollina closed this as completed in #77 Jul 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to remove the need for a PassThrough stream #15

Refactor to remove the need for a PassThrough stream #15

mcollina commented Jan 13, 2023

Uzlopak commented Jan 14, 2023

mcollina commented Jan 14, 2023

climba03003 commented Jan 18, 2023 •

edited

Loading

mcollina commented Jan 18, 2023

serzero2007 commented Jul 30, 2023 •

edited

Loading

mcollina commented Jul 30, 2023

serzero2007 commented Jul 30, 2023

Uzlopak commented Jul 30, 2023 •

edited

Loading

serzero2007 commented Jul 30, 2023

mcollina commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

Uzlopak commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

Uzlopak commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

Uzlopak commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Aug 1, 2023

Uzlopak commented Aug 1, 2023

serzero2007 commented Aug 1, 2023 •

edited

Loading

mcollina commented Nov 30, 2023

gurgunday commented Nov 30, 2023

Refactor to remove the need for a PassThrough stream #15

Refactor to remove the need for a PassThrough stream #15

Comments

mcollina commented Jan 13, 2023

Uzlopak commented Jan 14, 2023

mcollina commented Jan 14, 2023

climba03003 commented Jan 18, 2023 • edited Loading

mcollina commented Jan 18, 2023

serzero2007 commented Jul 30, 2023 • edited Loading

mcollina commented Jul 30, 2023

serzero2007 commented Jul 30, 2023

Uzlopak commented Jul 30, 2023 • edited Loading

serzero2007 commented Jul 30, 2023

mcollina commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 • edited Loading

serzero2007 commented Jul 31, 2023 • edited Loading

Uzlopak commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 • edited Loading

serzero2007 commented Jul 31, 2023 • edited Loading

Uzlopak commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 • edited Loading

serzero2007 commented Jul 31, 2023 • edited Loading

Uzlopak commented Jul 31, 2023

serzero2007 commented Jul 31, 2023 • edited Loading

serzero2007 commented Aug 1, 2023

Uzlopak commented Aug 1, 2023

serzero2007 commented Aug 1, 2023 • edited Loading

mcollina commented Nov 30, 2023

gurgunday commented Nov 30, 2023

climba03003 commented Jan 18, 2023 •

edited

Loading

serzero2007 commented Jul 30, 2023 •

edited

Loading

Uzlopak commented Jul 30, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Jul 31, 2023 •

edited

Loading

serzero2007 commented Aug 1, 2023 •

edited

Loading