-
-
Notifications
You must be signed in to change notification settings - Fork 105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewritten or Restructure #464
Comments
I think the biggest issue is handling big files as new devs dont think about memory leaks etc.. |
I call it "Apple style" - The user is requesting a limited number of options otherwise it can't choose I agree from the support perspective We should list the use case we want to support first |
I kind of agree that a refactoring of this is needed, however I'm not sure exactly in which direction. Most of the recently added features are not safe to use in generic sense, as keeping files in memory is a bad practice. |
The important use-case and most requesting feature is the below two.
No extra configuration and provides json body for the users.
|
Since here we are using |
Could we breakdown some issues into a project for this? |
Before more things are removed/changed, I just want to add that I really do like the DX of The docs warn you about accumulating files in memory but provide multiple ways to deal with that:
As I said, option 1 (especially with
So, all in all, I'm more interested in modifying/refactoring these lower-level methods and maybe changing the defaults of the plugin |
Maybe we should design multipart to be configured "incremental". You need multipart for processing forms. E.g. a login form does not need file uploads. So we first support out of the box that you can process forms and have the fields processed as attributes of the body object. Then we can determine, what we do with the files. Should we out of the box handle them as Buffers assigned to the body? Should we handle them as streams? Also we should consider, that we maybe could implement a better integration into the fastify schema system. In a perfect world, i would register multipart plugin with maybe some general limits. Maybe every file part is a stream. We supply the guys with two stores, like we supply fastify-rate-limit with two stores. One memory and one redis. So the body gets a getter for the file fields, which instantiate a stream. So your Memory store is the leaky bad one, like in fastify-rate-limit. You upload the file, the memory store keeps the Buffer in memory. Memory store is a wrapper around a Map. We generate an uuid as key and store the buffer in there. Then the body gets a corresponding property named to the file field and the getter is basically Readable(Map.get(uuid)). We could enrich that Readable with attributes like size and contentEncoding, and maybe even the key in the Map (makes more sense with the redis implementation below) The redis store basically the same. We configure the redis store and pass it as a configuration to the multipart plugin. We pump the files with uuids to the redis instance. Also add a property to the body, with a getter to a redisstream. Then in the handler you can use the stream to pipe it somewhere else. Imho this would be the best. For the memory store and for the redis store you could configure ttl. When i used file uploads in express, i always streamed the files to redis and passed the key of redis to the other microservices. So the other microservices could handle them, and maybe even unlink them after use. The benefit is also, that people could e.g. implement their own stores,like a mongodb gridfs store, a sqlite store, or a fs store or, or, or... Also by using streams and stores, it is easier to test without databases and we kind of encourage people to consider using streams and it would be easier to implement microservices, as the state would be forced to be outside the fastify microservice. I wrote once a redis stream implementation. Maybe it could be adapted accordingly. |
Fully agree, things can be a lot cleaner if file processing is opt-in, set either by a global setting or by per-route configuration, like how
I'm digging it, as long as it's consistent with other default content parsers. For instance, if I'm not mistaken, Fastify doesn't throw when a field that wasn't defined in the schema exists in JSON input For the rest, I know everyone hates on storing files in-memory and all, but I love it and use it very often – it's just too convenient 🤣 And it doesn't have to be leaky and bad. As I described above, by setting an appropriate size limit and error handler, it should always be possible to get rid of a file if necessary while making it easy to keep the validated buffer around during the lifecycle of a request So yeah I'm behind this plan as long as the in-memory store is not left out :D |
Yeah it is just a thought. Details can be discussed. The throwing part was just to illustrate, that we have to maybe couple the validation stronger. PreParsing Hook happens before PreValidation. This could also mean that when you put files into memory or a database or wherever, it will be filling them before we realize that the data is actually garbage and we dont need it. I looked now and openapi defines a file in formdata with Then it would be more like body gets parsed like a json payload and binary fields are streams. The ajv validator could run over it. And you can do whatever you want to do with the streams. The |
I believe everyone is skipping this part of comment (
You should neither use Although, buffering in memory is not a good choice. I still provided it and auto file download in my plugin.
I am always waiting https://github.com/tc39/proposal-explicit-resource-management implementing inside At this stage, I would either auto clean ourselves or mention it inside document.
|
Hmm. Yes, if file is only a string pointing to the path where it is stored, than it is definetly easier. You could even define a protocol like file://tmp/bla.txt or redis://key or mongogridfs://mongoid or memory://bla.txt. Somehow i like my idea of having them as streams also as it doesnt mean that the implementer has to think about what to do, and can still get the data easily in the handler, by reading stream.key. but this would also mean that if i want to use the data in my other microservice i also need the right readablestream. Not that nice again. I get your point regarding using the filesystem. Filesystem is definetly cheaper. Maybe also has the benefit that people can activate antivirus programs on the filesystem scan the incoming folder. |
It would make things a lot easier and cheaper. I'm just questioning weather it can be depended on since it's not an always-available thing. For instance, I host a monolith on DigitalOcean's App Platform but I'm not sure if this feature exists there, I will check it out though
Agreed in principle, as long as it's simple enough to use at the end of the day. Right now, with My only concern is that the limits could cause data loss if a buffer is not used and gotten rid of fast enough. With the current set up, I'm having no difficulties letting people upload files on a 512Mb server
This is so cool! We should definitely use it when it's implemented |
Placing However, I would like to split fastify.post('/', async function(request, reply) {
// access files
request.files
// access body
// note that file fields will exist in body and it will becomes the file path saved on disk
request.body
}) Handle fastify.post('/', async function(request, reply) {
// you can use async iterator for file handling
for await (const { type, name, value, info } of request.multipart()) {
switch(type) {
case 'field': {
console.log(name, value)
break
}
case 'file': {
console.log(name, value, info.filename)
// value is Readable
value.resume()
break
}
}
}
}) |
I would recommend we split the logic into two module. One that's low level just parses the incoming multipart data, providing a low level stream API. One that uses that module to provide a high-level API (and stores the files on disk in between). I solved the "on exit" problem with https://www.npmjs.com/package/on-exit-leak-free, which should allow us to keep a global list of files and remove them in case of a crash. |
Dont we have that separation already? We have busboy which does the incomimng multipart data parsing and fastify/multipart for the integration into fastify |
I don't think so, integrating it with Fastify is worth some code. I would not use the API that stores files on disk by default. The old event-based API was enough for all my needs. |
I would really like the new API to look like this – based on the browser's We can keep options like Example usage: async handler(req, res) {
const form = await req.formData();
const email = form.get("email")?.toString();
// ...
}, |
Prerequisites
🚀 Feature Proposal
We should reduce the methods to something that is necessary.
For example,
parts
,files
,file
does nearly the same but confusing.We should either provide parsing method directly or
async iterator
for advanced user.The current
request.body
object is not user friendly, which leads to multiple config (e.g.keyValues
)Motivation
With the growth of feature and never removed deprecated functions.
The current codebase is messy and not user-friendly to beginner.
I found must people is not family with how stream works and ask for simple solution multiple times.
That's why I created
fastify-formidable
andfastify-busboy
.I believe we can do something to provide a more generic interface and not fixed to a single parsing engine.
So, I created
@kakang/fastify-multipart
and seek for advice.Example
No response
The text was updated successfully, but these errors were encountered: