Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v1 Pool issue / usage issue #108

Closed
andywer opened this issue Jun 8, 2019 · 24 comments
Closed

v1 Pool issue / usage issue #108

andywer opened this issue Jun 8, 2019 · 24 comments

Comments

@andywer
Copy link
Owner

andywer commented Jun 8, 2019

As originally reported by @aarmora in #100:

Hey @andywer! I decided to turn my constant twitter barrage into official stuff here. I'm loving v1 and I have been messing with Pool today. The events logged say that the size is 2 in my case but I don't seem to see any queueing.

I would have expected it to show two blocks of all the items, output the numbers looped through for those two arrays, and then two more blocks of all the items. Instead it looks like it's running everything immediately.

Example code:

// Index.ts

import { spawn, Thread, Worker, Pool } from 'threads';

(async () => {
    let spawnCalled = 0;
    const spawnLooper = () => {
        spawnCalled++;
        return spawn<(items: any[]) => Promise<any>>(new Worker('./../../../../dist/looper.js'));
    };

    const pool = Pool(spawnLooper, 2);

    pool.events().subscribe(console.log);

    const bigList = new Array(100);
    const listWithNumbers: any[] = [];

    for (let i = 0; i < bigList.length; i++) {
        listWithNumbers.push(i);
    }
    console.log('before');

    await pool.queue(async looper => {
        const promises: any[] = [];
        for (let i = 0; i < listWithNumbers.length; i++) {
            promises.push(looper(listWithNumbers.slice(i, i + 25)));
            i += 25;
        }
        await Promise.all(promises);
    });

    await pool.terminate();
    console.log('after', spawnCalled);

})();
// looper.ts
import { expose } from 'threads/worker';

expose(async (items: any[]) => {
    console.log('all the items', items);

    for (let i = 0; i < items.length; i++) {
        await wait(items[i]);
    }

    function wait(i: number) {
        return new Promise((resolve) => {
            setTimeout(() => {
                console.log(i);
                resolve();
            }, 150);
        })
    }
    return Promise.resolve();
});

Image Example:
image

@andywer
Copy link
Owner Author

andywer commented Jun 8, 2019

Thanks for reporting, @aarmora. Moved your comment into this issue here, since it's quite complex 😉

I think it's more of a usability issue. Might be worth reviewing the API, though, if it's confusing to use.

Considering your code: You only call pool.queue() once, so you only schedule one job. That's why everything is executed at once. I think what you were expecting would be implemented as pool.queue()-ing multiple batches of work.

If you have a suggestion how to make the usage clearer, feel free to suggest a change.

@aarmora
Copy link
Contributor

aarmora commented Jun 8, 2019

@andywer Good feedback. If that's the issue, then that totally makes sense and then I just think only the documentation is the issue showing more than one item being queued. I can submit a PR for that once I fully understand how it works.

I tried to implement it how I would expect it per your explanation; queueing three tasks when my pool is set to 2 and I get similar results. It hits all the items three times and then starts looping through the numbers. I would expect it to hit all the items twice and then loop through two sets of numbers and then hit all the items again.

Maybe this is due to how these are promises and I'm not awaiting for them until below so the queue thinks it's finished before the task is actually finished. This doesn't match exactly what the event queue is returning since it does queue, start, complete three times in a row rather than queue, start twice, followed by two completes, then a final queue, start, complete.

image

@aarmora
Copy link
Contributor

aarmora commented Jun 8, 2019

@andywer But actually... isn't my loop calling await pool.queue multiple times as it loops through it?

@andywer
Copy link
Owner Author

andywer commented Jun 8, 2019

Yeah, the out-commented for (...) { await pool.queue(...) } loop looks quite good.

There is one thing that's odd, though. It should rather be like this:

- await pool.queue(async looper => {
-   promises.push(looper(/* ... */))
- })
+ promises.push(pool.queue(async looper => {
+   await looper(/* ... */)
+ }))

Not sure how much of a difference it makes in practice, though.

Idea

There should be a convenience method to wait until the pool has completed all pending jobs. How about something like this?

for (/* ... */) {
  pool.queue(looper => looper(/* ... */))
}
await pool.completed()

@andywer
Copy link
Owner Author

andywer commented Jun 8, 2019

The order of the console output is strange indeed. Got to investigate. Maybe the console output of master and worker thread is not guaranteed to be printed in order.

@aarmora
Copy link
Contributor

aarmora commented Jun 8, 2019

@andywer I am actually intentionally not awaiting those loops so they run concurrently. The idea I'm trying to implement is doing the work normally done in a loop a lot faster by leveraging multiple threads to do it concurrently. Is there a better way to do this?

@andywer
Copy link
Owner Author

andywer commented Jun 8, 2019

Sorry, I had one typo in the first for-loop example - of course you don’t want to await the pool.queue().

Look at what I outlined under "Idea". It's really just a convenient way to not have to promises.push() manually 😉

@aarmora
Copy link
Contributor

aarmora commented Jun 8, 2019

@andywer Yes, having a pool.completed() method would definitely be awesome!

@andywer
Copy link
Owner Author

andywer commented Jun 9, 2019

Try the latest alpha version: 1.0.0-alpha.4

You can now use call spawn() in TypeScript without providing type parameters and the pool now comes with a pool.completed() method.

Let me know if that solves the issue for you. Haven't investigated the strange log output yet, though. You can also try to set the environment variable DEBUG=threads:* to enable debug logging within the library.

@andywer
Copy link
Owner Author

andywer commented Jun 9, 2019

Btw, do you still have issues using the lib with node < 12?

@aarmora
Copy link
Contributor

aarmora commented Jun 10, 2019

@andywer I just tried with node 10 and I still get the following error:

image

@aarmora
Copy link
Contributor

aarmora commented Jun 10, 2019

@andywer alpha.4 worked GREAT with await pool.completed();

Check my test code.

import { spawn, Thread, Worker, Pool } from 'threads';
import { performance } from 'perf_hooks';

(async () => {
    const startTime = performance.now();
    let spawnCalled = 0;
    const spawnLooper = () => {
        spawnCalled++;
        return spawn(new Worker('./../../../../dist/looper.js'));
    };

    const pool = Pool(spawnLooper, 15);

    pool.events().subscribe(console.log);

    const bigList = new Array(1000);
    const listWithNumbers: any[] = [];

    for (let i = 0; i < bigList.length; i++) {
        listWithNumbers.push(i);
    }
    console.log('before');

    for (let i = 0; i < listWithNumbers.length; i++) {
        pool.queue(looper => looper(listWithNumbers.slice(i, i + 25)));
        i += 25;
    }
    await pool.completed();

    await pool.terminate();
    const endTime = performance.now();
    console.log(`Completed after ${endTime - startTime}`, spawnCalled);

})();

The blocks log out as expected when they hit the exposed looper function, matching the pool limit I put on it.

@andywer
Copy link
Owner Author

andywer commented Jun 10, 2019

Cool! 🙌

@andywer I just tried with node 10 and I still get the following error

I think I have a rough idea why it's failing. It's due to Windows: It seems to interpret some path C:\... as a URL, C being interpreted as a the protocol.

I don't have a Windows machine at hand... Could you add some console.log() statements to node_modules/tiny-worker/lib/worker.js to locate the exact call that causes the error and what path was passed? That would help a lot!

@aarmora
Copy link
Contributor

aarmora commented Jun 10, 2019

It appears to be at var exp = isfn ? toFunction(input) : esm ? "require(\"" + input + "\");" : fs.readFileSync(input, "utf8");.

It also doesn't appear to be breaking all the time, oddly enough. Here's what my file looks like and then the terminal output. You can see in the terminal it hits both before and after the fs function but sometimes it only hits above, then it gets the error.

Hope this helps.

image

image

@andywer
Copy link
Owner Author

andywer commented Jun 11, 2019

Hmmm, I think we also need to log what `obj' is before shit hits the fan...

@aarmora
Copy link
Contributor

aarmora commented Jun 11, 2019

Additional logs:

image

Terminal (the first error is in this screenshot):

image

@andywer
Copy link
Owner Author

andywer commented Jun 12, 2019

Thanks, man! I think I finally found the corresponding issue in the node repo: nodejs/node#15374

What node version do you use? Is it an up-to-date node 10.x?

Anyway, we should be able to work around this issue by making the absolute path relative again on Windows.

@andywer
Copy link
Owner Author

andywer commented Jun 12, 2019

PS: Can you try changing the code in tiny-worker's worker.js to:

(...) esm ? "require(\"" + path.relative(cwd, input) + "\")" : (...)

@andywer
Copy link
Owner Author

andywer commented Jun 13, 2019

@aarmora Can you maybe checkout the branch chore/travis-windows and try to run the tests on Windows with node 10? I added a fix for the Windows path issue.

The tests fail with a timeout error in Travis CI on a Windows runner, but I am not sure if an error occurs that is not shown or if it's actually running into a timeout...

@aarmora
Copy link
Contributor

aarmora commented Jun 13, 2019

@andywer I am testing on Node v10.16.0. Changing to path.relative(cwd, input) gives the following with the this path return spawn(new Worker('./../../../../dist/teacher-worker.js'));.

Error: Cannot find module 'dist\teacher-worker.js'
Require stack:
- C:\Users\jbhan\sites\client-teacher-testing\node_modules\tiny-worker\lib\worker.js
    at evalmachine.<anonymous>:1
    at Script.runInThisContext (vm.js:122:20)
    at process.<anonymous> (C:\Users\jbhan\sites\client-teacher-testing\node_modules\tiny-worker\lib\worker.js:89:30)
    at Object.onceWrapper (events.js:286:20)
    at process.emit (events.js:198:13)
right before vm require("dist\teacher-worker.js");

If I adjust the worker path to return spawn(new Worker('./../dist/teacher-worker.js')); I get:

Error: Cannot find module 'node_modules\threadsdistdist\teacher-worker.js'
Require stack:
- C:\Users\jbhan\sites\client-teacher-testing\node_modules\tiny-worker\lib\worker.js
    at evalmachine.<anonymous>:1
    at Script.runInThisContext (vm.js:122:20)
    at process.<anonymous> (C:\Users\jbhan\sites\client-teacher-testing\node_modules\tiny-worker\lib\worker.js:89:30)
    at Object.onceWrapper (events.js:286:20)
    at process.emit (events.js:198:13)
C:\Users\jbhan\sites\client-teacher-testing\node_modules\tiny-worker\lib\worker.js:1

I'm not familiar with path.relative and so without digging into it, it looks like I need another slash in there somewhere.

@aarmora
Copy link
Contributor

aarmora commented Jun 13, 2019

@andywer Running the tests on chore/travis-windows looks good to me. Here are the results:

image

@andywer
Copy link
Owner Author

andywer commented Jun 13, 2019

Cool and thanks a lot! Let's try running the tests again with a 30s timeout then, shall we? 😅🙈

It's ridiculous that it takes so long to spawn a new process with tiny-worker on Windows...

@andywer
Copy link
Owner Author

andywer commented Jun 13, 2019

Smells like a success. Finally!

image

@aarmora
Copy link
Contributor

aarmora commented Jun 13, 2019

@andywer Woo!! Nice! :D

@andywer andywer closed this as completed Jun 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants