Implement per-Dataset I/O scheduling #2

mmomtchev · 2021-06-04T12:04:35Z

There are currently some use cases which can lead to significantly degraded I/O performance through thread starvation or even blocking the event loop in async mode. ASYNCIO.md describes the steps needed to avoid these situations, but the user cannot be expected to understand the internals of the project as this defeats the point of having an abstraction layer on top of GDAL in the first place.

All of these problems can be solved by implementing per-Dataset I/O queues and replacing Nan::AsyncWorker with another implementation which schedules I/O operations.

This mechanism:

Must not eat a thread slot per Dataset as there can be much more Datasets than slots on the thread pool
Must be fair to avoid starvation - ie an application constantly reading from 5 Datasets on a default Node.js thread pool with 4 threads must read (almost) uniformly from all Datasets

The text was updated successfully, but these errors were encountered:

mmomtchev · 2021-11-09T11:49:05Z

The main challenge that must be solved to support per-Dataset task queues is the fact that queuing libuv work is possible only from the main thread.

The current queuing/multi-threading model of Node.js/Nan (or Node.js/N-API) implements a framework that allows to execute a task in a background thread (selected from a pool) and then to queue a callback on the event loop with the result. It also allows for sending of best-effort progress callbacks - that are run only if the main thread is idle at that moment. This background thread cannot schedule more libuv work and cannot continue dequeuing the dataset - it must wait for the main thread to schedule the next operation. Currently, one JS async function call = one libuv async task.

This framework is to be replaced with a new one that supports running multiple tasks (the dataset queues) with separate async contexts and then queuing the callbacks on the event loop to be run on the main thread, ie, multiple JS async function calls on the same dataset are to be executed by one libuv async task.

mmomtchev · 2022-02-03T16:55:56Z

This depends on libuv/libuv#3429

mmomtchev added the enhancement New feature or request label Jun 4, 2021

mmomtchev added this to the 3.3 milestone Jun 4, 2021

mmomtchev changed the title ~~Impelement per-Dataset I/O scheduling~~ Implement per-Dataset I/O scheduling Jun 4, 2021

mmomtchev removed this from the 3.3 milestone Jun 16, 2021

mmomtchev added this to the 3.5 milestone Nov 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement per-Dataset I/O scheduling #2

Implement per-Dataset I/O scheduling #2

mmomtchev commented Jun 4, 2021

mmomtchev commented Nov 9, 2021

mmomtchev commented Feb 3, 2022

Implement per-Dataset I/O scheduling #2

Implement per-Dataset I/O scheduling #2

Comments

mmomtchev commented Jun 4, 2021

mmomtchev commented Nov 9, 2021

mmomtchev commented Feb 3, 2022