Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement per-Dataset I/O scheduling #2

Open
mmomtchev opened this issue Jun 4, 2021 · 2 comments
Open

Implement per-Dataset I/O scheduling #2

mmomtchev opened this issue Jun 4, 2021 · 2 comments
Labels
enhancement New feature or request
Milestone

Comments

@mmomtchev
Copy link
Owner

There are currently some use cases which can lead to significantly degraded I/O performance through thread starvation or even blocking the event loop in async mode. ASYNCIO.md describes the steps needed to avoid these situations, but the user cannot be expected to understand the internals of the project as this defeats the point of having an abstraction layer on top of GDAL in the first place.

All of these problems can be solved by implementing per-Dataset I/O queues and replacing Nan::AsyncWorker with another implementation which schedules I/O operations.

This mechanism:

  • Must not eat a thread slot per Dataset as there can be much more Datasets than slots on the thread pool
  • Must be fair to avoid starvation - ie an application constantly reading from 5 Datasets on a default Node.js thread pool with 4 threads must read (almost) uniformly from all Datasets
@mmomtchev mmomtchev added the enhancement New feature or request label Jun 4, 2021
@mmomtchev mmomtchev added this to the 3.3 milestone Jun 4, 2021
@mmomtchev mmomtchev changed the title Impelement per-Dataset I/O scheduling Implement per-Dataset I/O scheduling Jun 4, 2021
@mmomtchev mmomtchev removed this from the 3.3 milestone Jun 16, 2021
@mmomtchev
Copy link
Owner Author

The main challenge that must be solved to support per-Dataset task queues is the fact that queuing libuv work is possible only from the main thread.

The current queuing/multi-threading model of Node.js/Nan (or Node.js/N-API) implements a framework that allows to execute a task in a background thread (selected from a pool) and then to queue a callback on the event loop with the result. It also allows for sending of best-effort progress callbacks - that are run only if the main thread is idle at that moment. This background thread cannot schedule more libuv work and cannot continue dequeuing the dataset - it must wait for the main thread to schedule the next operation. Currently, one JS async function call = one libuv async task.

This framework is to be replaced with a new one that supports running multiple tasks (the dataset queues) with separate async contexts and then queuing the callbacks on the event loop to be run on the main thread, ie, multiple JS async function calls on the same dataset are to be executed by one libuv async task.

@mmomtchev mmomtchev added this to the 3.5 milestone Nov 9, 2021
@mmomtchev
Copy link
Owner Author

This depends on libuv/libuv#3429

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant