Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] fs promises #17739

Closed
wants to merge 3 commits into from
Closed

[WIP] fs promises #17739

wants to merge 3 commits into from

Conversation

jasnell
Copy link
Member

@jasnell jasnell commented Dec 18, 2017

This is a work in progress continuation of the prototyping work started originally in #15485.

Ping @MylesBorins @addaleax @mcollina

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • documentation is changed or added
  • commit message follows commit guidelines
Affected core subsystem(s)

fs

@jasnell jasnell added experimental Issues and PRs related to experimental features. fs Issues and PRs related to the fs subsystem / file system. wip Issues and PRs that are still a work in progress. promises Issues and PRs related to ECMAScript promises. labels Dec 18, 2017
@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. labels Dec 18, 2017
statValues[8], statValues[9] < 0 ? undefined : statValues[9],
statValues[10], statValues[11], statValues[12],
statValues[13]);
function statsFromValues(values = statValues) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is necessary because using an async function for stat operations means we have a race condition on the use of statValues when there's just a single TypedArray. So, for the normal callback case, we keep the single shared typed array, but for the promises version, we allocate one TypedArray per Promise. It's a bit more expensive for the promises case, but keeps things status quo for the callback case.

lib/fs.js Outdated
yield buffer.slice(0, length);
}
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mcollina ... take note of this function. It is an async generator used in an async iterator within fs.promises.readFile(). Significantly less complicated than the current callback based implementation.

lib/fs.js Outdated
let position = 0;
const buffer = Buffer.allocUnsafe(kReadFileBufferLength);
while (!eof) {
if (loops > 5) process.exit(0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no place where loop is incremented

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, that's a left over... should have removed that.

lib/fs.js Outdated
data : Buffer.from('' + data, options.encoding || 'utf8');
const position = /a/.test(flag) ? null : 0;

writeEverything(fd, buffer, 0, buffer.length, position);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be a return?

lib/fs.js Outdated
async function readFd(fd) {
let ret = Buffer.alloc(0);
for await(const chunk of readEverything(fd))
ret = Buffer.concat([ret, chunk]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it's bad for performance to concat for each chunk. Is it possible to push to an array and concat once we are out of the loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, whatever works best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for await is very slow at the moment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep... part of my goal is to make this available then see what @bmeurer and team can do in terms of making it faster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasnell this would make the promise API prohibitebly expensive to use though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the API is still experimental, performance is not the priority. If we're not able to get the performance optimized adequately, then we can move to a different approach.

lib/fs.js Outdated
const buffer = Buffer.allocUnsafe(kReadFileBufferLength);
while (!eof) {
const { length } =
await fs.promises.read(fd, buffer, 0, buffer.length, position);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d drop the , position if it isn’t necessary… without it, this should work on un-seekable files (e.g. TTYs, pipes) as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, had that on my list to work through

lib/fs.js Outdated
}

fs.promises = {
async access(path, mode = fs.F_OK) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you set fs.access[util.promisify.custom] = fs.promises.access; etc.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. I will be

src/node_file.cc Outdated
.ToLocalChecked(),
AsyncWrap::PROVIDER_FD),
fd_(fd) {
CHECK_GE(fd, 3); // Do not allow wrapping of stdio fd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep myself from doing silly things until it's a bit more baked.

src/node_file.cc Outdated
FillStatsArray(env->fs_stats_field_array(),
static_cast<const uv_stat_t*>(req->ptr));
req_wrap->Resolve(Undefined(req_wrap->env()->isolate()));
req_wrap->FillStatsArray(static_cast<const uv_stat_t*>(req->ptr));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sidenote, using &req->statbuf will let you get rid of the cast + remove one layer of indirection

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call

src/node_file.h Outdated
using v8::Undefined;
using v8::Value;

namespace fs {

class FSReqWrap: public ReqWrap<uv_fs_t> {
enum Ownership { COPY, MOVE };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make this enum class? I think that would be a bit more readable since COPY and MOVE are pretty short identifiers and it’s not obvious what kind of thing they refer to…

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also good call

async fchmod(fd, mode) {
mode = modeNum(mode);
if (fd == null || typeof fd !== 'object' || !Number.isInteger(fd.fd))
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'fd', 'FD');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ERR_INVALID_FD might be more suitable for the errors like this.

eof = true;
yield buffer.slice(0, length);
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to be very slow. Do we know if there is some improvements in the area from the V8 team?
Can you use the same class approach I used for Async Iterators? It ended up being 2x the async generator function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this can use the class approach, but I'm purposefully leaving this in so that folks my @bmeurer can take a look and figure out how to make it faster. I will be switching it to the class approach a bit later.

length -= written;
if (position !== null)
position += written;
return writeEverything(fd, buffer, offset, length, position);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the buffer is big enough, it could lead to a stack overflow. Can this be done as a loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it returns a Promise, does it overflow the stack? :-)

That said, I'm already planning to rework this function. This was a temporary impl

@jasnell
Copy link
Member Author

jasnell commented Dec 21, 2017

@bmeurer / @fhinkel ... question for you.

With this implementation currently... within an async function, I create an instance of FSReqPromise, which is a handle object provided by process.binding('fs'). On construction, it creates a V8 Promise at the C++ layer. The async function then calls back down to C++, passing the FSReqPromise along with it. At the C++ layer, we detect that this handle thing is an FSReqPromise and instead of calling a callback, we resolve the Promise when the operation completes.

It works rather nicely as is, however... it is a bit inefficient because we already know that we are running within an async function, which we know will create it's own Promise. What I would like to be able to do is avoid creating the FSReqPromise at all and have it so that the process.binding('fs') exposed functions know they are executing within an Async Function and give them the ability to access the Promise that is created.

This would save two additional allocations (one for the FSReqPromise handle and one for the v8::Promise it creates) and would eliminate one boundary cross per operation.

Thing is, I don't know if this would be possible within V8 (I know it's not possible now).

For instance.... something like:

static void Open(const FunctionCallbackInfo<Value>& args) {
  Environment* env = Environment::GetCurrent(args);
  Local<Promise> promise - args.Promise();
  ...
  args.GetReturnValue().Set(promise);
}

@addaleax
Copy link
Member

@jasnell I think that would basically be equivalent to having native async functions, right? I can’t tell you whether V8 wants it or not, but it should certainly be doable…?

@jasnell
Copy link
Member Author

jasnell commented Dec 21, 2017

Yes, that's exactly it. I want the native code to be able to know that it's running within an async function() without us having to guess and without us having to pass in additional context that wastes allocations.

@jasnell
Copy link
Member Author

jasnell commented Dec 21, 2017

Or, perhaps a different approach... having the ability to explicitly create Async Functions at the native level with an AsyncFunctionCallback/AsyncFunctionCallbackInfo construct... e.g.

void AsyncThing(const AsyncFunctionCallbackInfo<Value>& args) {
  Local<Promise> promise = args.Promise();
  // resolve sync or async here... setting the return value like below would resolve sync
  args.GetReturnValue().Set(1);
}

such that the function itself returns a Promise and the args.GetReturnValue().Set(1) resolves the promise, and any exceptions thrown on the isolate from within the function trigger a rejection.

However it works, I just want to avoid having to allocate the FSReqPromise handle and the additional Promise.

@bmeurer
Copy link
Member

bmeurer commented Dec 21, 2017

@jasnell That does sound possible. I don't see why async API functions shouldn't be supported, but I'm not the expert here.

Let me redirect this question to @hashseed, @caitp, @schuay and @gsathya. WDYT?

@gsathya
Copy link
Member

gsathya commented Dec 22, 2017

Oh, exciting to see all this progress! Nice work @jasnell 😄

Async API functions can be supported but I don't see the need here. Looking at methods in promises -- almost all the async methods can just be normal methods. They don't use await, they just return a promise, so there's no need for them to be async.

A couple of them do use await in that case you don't want to use the promise created by the async function because that could mess with the behavior/timing of the async function.

On construction, it creates a V8 Promise at the C++ layer. The async function then calls back down to C++, passing the FSReqPromise along with it.

The double call out to C++ can be avoided by having the binding create and return the promise. This would improve performance greatly. For example:

Instead of doing this:

    const req = new FSReqPromise();
    binding.futimes(fd, atime, mtime, req);
    return req.promise;

we could do:

    const req = binding.futimes(fd, atime, mtime);
    return req.promise;

With this change, there's only one call out to C++.

@jasnell
Copy link
Member Author

jasnell commented Dec 22, 2017

almost all the async methods can just be normal methods. They don't use await, they just return a promise, so there's no need for them to be async.

Yeah, I've been going back and forth on this a bit. One of the key reasons they are async functions now is to simplify the error handling. The nullCheck() function, for instance, will throw an exception which is automatically handled as a promise rejection within the async function. Making those regular functions means we have to do a slight amount more work with those (obviously it's fairly trivial but still.)

you don't want to use the promise created by the async function ...

Yeah, as I was digging in to it more I realized that also. It would still be good to have the ability to create an async function at the native layer so that I can do something like:

void AsyncThing(const AsyncFunctionCallbackInfo<Value>& args) {
  args.GetReturnValue().Set(123);
}

At the C++ level and...

const binding = process.binding('foo');
await binding.asyncThing();

At the JavaScript level without having to worry about the explicit Promise creation stuff within C++.

With this change, there's only one call out to C++

Yep, and I that's ultimately where I'm going to go. The reason I've gone with the FSReqPromise model is that it is the least disruptive to the current model used by the code. The handle object is used to provide async context and continuation local storage in some cases. For instance, in some cases, preserving references to a Buffer to keep it from being garbage collected until the async operation completes. Changing the binding to return a Promise if the handle is not provided is certainly quite doable but changes some of the assumptions and requires some larger changes elsewhere. Definitely not impossible by any stretch of the imagination and likely the right way to go.

@caitp
Copy link
Contributor

caitp commented Dec 22, 2017

How would you model the “resume” strategy in C++?

@jasnell
Copy link
Member Author

jasnell commented Dec 22, 2017

How would you model the "resume" strategy in C++

TBD, I think. I'm not yet 100% sure.

@benjamingr
Copy link
Member

benjamingr commented Dec 24, 2017

@caitp C# models them (effectively) with a class (rather than a function) with a state variable and an entry hook.

class MyAsyncFunction : AsyncFunction { // preferably, not with inheritance though
  // actually set automatically by superclass constructor, can be Finished, Thrown (?)
  int state = V8::AsyncFunction::Initial; 
  v8::Value onEnter() {
    // state is the entry number, can also use any
    switch (callCount) { // what are we awaiting?
      case 0: {
          // initial code + return
      }
      case 1: {
         // second call, and so on
      }
    }
  }
};

However I simply don't see any benefit to this over using a regular function and doing this yourself.

boost coroutines (if were used) could provide a more elegant solution - but let's don't :)

@jasnell
Copy link
Member Author

jasnell commented Jan 22, 2018

Closing in favor of #18297

@jasnell jasnell closed this Jan 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. experimental Issues and PRs related to experimental features. fs Issues and PRs related to the fs subsystem / file system. lib / src Issues and PRs related to general changes in the lib or src directory. promises Issues and PRs related to ECMAScript promises. wip Issues and PRs that are still a work in progress.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants