Releases: It4innovations/hyperqueue
Nightly build 2024-11-06
HyperQueue dev
Changes
hq event-log
command renamed tohq journal
Artifact summary:
- hq-vdev-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-dev-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.20.0
HyperQueue 0.20.0
New features
-
It is now possible to dynamically submit new tasks into an existing job (we call this concept "Open jobs").
See Open jobs documentation -
Worker streaming. Before, you could stream task stderr/stdout to the server over the network using the
--log
parameter ofhq submit
.
This approach had various issues and was not scalable. Therefore, we have replaced this functionality with worker streaming,
where the streaming of task output to a set of files on disk is performed by workers instead.
This new streaming approach creates more files than original solution (where it was always one file per job),
but the number of files stays small and independent on the number of executed tasks.
The new architecture also allows parallel I/O writing and storing of multiple job streams in one stream handle.
You can use worker streaming using the--stream
parameter ofhq submit
. Check out the documentation for more information. -
Optimization of journal size
-
Tasks' crash counters are not increased when worker is stopped by
hq worker stop
or by time limit.
Removed
- Because worker streaming fully replaces original streaming, the original server streaming was removed.
For most cases, you can rename--log
to--stream
andhq log
tohq output-log
. See the docs for more details.
Fixes
- HQ should no longer crash while printing job info when a failed task does not have any workers
attached (#731).
Note
- Dashboard still not enabled in this version
Artifact summary:
- hq-v0.20.0-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.20.0-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.20.0-rc2
HyperQueue 0.20.0-rc2
New features
-
It is now possible to dynamically submit new tasks into an existing job (we call this concept "Open jobs").
See Open jobs documentation -
Worker streaming. Before, you could stream task stderr/stdout to the server over the network using the
--log
parameter ofhq submit
.
This approach had various issues and was not scalable. Therefore, we have replaced this functionality with worker streaming,
where the streaming of task output to a set of files on disk is performed by workers instead.
This new streaming approach creates more files than original solution (where it was always one file per job),
but the number of files stays small and independent on the number of executed tasks.
The new architecture also allows parallel I/O writing and storing of multiple job streams in one stream handle.
You can use worker streaming using the--stream
parameter ofhq submit
. Check out the documentation for more information. -
Optimization of journal size
-
Tasks' crash counters are not increased when worker is stopped by
hq worker stop
or by time limit.
Removed
- Because worker streaming fully replaces original streaming, the original server streaming was removed.
For most cases, you can rename--log
to--stream
andhq log
tohq output-log
. See the docs for more details.
Fixes
- HQ should no longer crash while printing job info when a failed task does not have any workers
attached (#731).
Note
- Dashboard still not enabled in this version
Artifact summary:
- hq-v0.20.0-rc2-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.20.0-rc2-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.19.0
HyperQueue 0.19.0
New features
-
Server resilience. Server state can be loaded back from a journal when it crashes. This will restore the state of submitted jobs and also autoallocator queues. Find out more here.
-
HQ_NUM_NODES
for multi-node tasks introduced. It contains the number of nodes assigned to task.
You do not need to manually count lines inHQ_NODE_FILE
anymore.
Changes
-
Dashboard is disabled in this version. We expect to reneeble it in 1-2 release cycles
-
Node file generated for multi-node tasks now contains only short hostnames
(e.g. if hostname is "cn690.karolina.it4i.cz", only "cn690" is written into node list)
You can readHQ_HOST_FILE
if you need to get full hostnames without stripping.
Fixes
- Enable passing of empty
stdout
/stderr
to Python function tasks in the Python
API (#691). hq alloc add --name <name>
will now correctly use the passed<name>
to name allocations submitted to Slurm/PBS.
Artifact summary:
- hq-v0.19.0-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.19.0-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.19.0-rc1
HyperQueue 0.19.0-rc1
New features
-
Server resilience. Server state can be loaded back from journal when server crashes.
-
HQ_NUM_NODES
for multi-node tasks introduced. It contains the number of nodes assigned to task.
You do not need to manually count lines inHQ_NODE_FILE
anymore.
Changes
-
Dashboard is disabled in this version. We expect to reneeble it in 1-2 release cycles
-
Node file generated for multi-node tasks now contains only short hostnames
(e.g. if hostname is "cn690.karolina.it4i.cz", only "cn690" is written into node list)
You can readHQ_HOST_FILE
if you need to get full hostnames without stripping.
Fixes
- Enable passing of empty
stdout
/stderr
to Python function tasks in the Python
API (#691). hq alloc add --name <name>
will now correctly use the passed<name>
to name allocations submitted to Slurm/PBS.
Artifact summary:
- hq-v0.19.0-rc1-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.19.0-rc1-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.18.0
HyperQueue 0.18.0
Breaking changes
-
Mechanism for resubmitting tasks was changed. Command
resubmit
was removed,
see https://it4innovations.github.io/hyperqueue/latest/jobs/failure/ for replacement. -
The output format of the
job info
command with JSON output mode has been changed. Note that
the JSON output mode is still unstable.
New features
-
Combination of --time-request and --nodes is now allowed
-
Allow setting a time request for a task (
min_time
resource value) using the Python API. -
Optimizations related to job submit & long term memory saving
-
The CLI dashboard is now enabled by default. You can try it with the
hq dashboard
command. Note that it is still
very experimental and a lot of useful features are missing.
Artifact summary:
- hq-v0.18.0-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.18.0-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.18.0-rc1
HyperQueue 0.18.0-rc1
Breaking change
-
Mechanism for resubmitting tasks was changed. Command
resubmit
was removed,
see https://it4innovations.github.io/hyperqueue/latest/jobs/failure/ for replacement. -
The output format of the
job info
command with JSON output mode has been changed. Note that
the JSON output mode is still unstable.
New features
-
Combination of --time-request and --nodes is now allowed
-
Allow setting a time request for a task (
min_time
resource value) using the Python API. -
Optimizations related to job submit & long term memory saving
-
The CLI dashboard is now enabled by default. You can try it with the
hq dashboard
command. Note that it is still
very experimental and a lot of useful features are missing.
Artifact summary:
- hq-v0.18.0-rc1-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.18.0-rc1-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.17.0-liberec
HyperQueue 0.17.0-liberec
Breaking change
Memory resource in megabytes
- Automatically detected resource "mem" that is the size of RAM of a worker is now using megabytes as a unit.
i.e.--resource mem=100
asks now for 100 MiB (previously 100 bytes).
New features
Non-integer resource requests
- You may now ask of non-integer amount of a resource. e.g. for 0.5 of GPU.
This enables resource sharing on the logical level of HyperQueue scheduler and allows to utilize remaining part the resource
by another tasks.
Job submission
- You can now specify
cleanup modes
when passingstdout
/stderr
paths to tasks. Cleanup mode decides what should
happen with the file once the task has finished executing. Currently, a single cleanup mode is implemented, which removes
the file if the task has finished successfully:
$ hq submit --stdout=out.txt:rm-if-finished /my-program
Fixes
- Fixed crash when task fails during its initialization
Artifact summary:
- hq-v0.17.0-liberec-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.17.0-liberec-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.17.0
HyperQueue 0.17.0
Breaking change
Memory resource in megabytes
- Automatically detected resource "mem" that is the size of RAM of a worker is now using megabytes as a unit.
i.e.--resource mem=100
asks now for 100 MiB (previously 100 bytes).
New features
Non-integer resource requests
- You may now ask of non-integer amount of a resource. e.g. for 0.5 of GPU.
This enables resource sharing on the logical level of HyperQueue scheduler and allows to utilize remaining part the resource
by another tasks.
Job submission
- You can now specify
cleanup modes
when passingstdout
/stderr
paths to tasks. Cleanup mode decides what should
happen with the file once the task has finished executing. Currently, a single cleanup mode is implemented, which removes
the file if the task has finished successfully:
$ hq submit --stdout=out.txt:rm-if-finished /my-program
Fixes
- Fixed crash when task fails during its initialization
Artifact summary:
- hq-v0.17.0-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.17.0-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.
v0.17.0-rc1
HyperQueue 0.17.0-rc1
Breaking change
Memory resource in megabytes
- Automatically detected resource "mem" that is the size of RAM of a worker is now using megabytes as a unit.
i.e.--resource mem=100
asks now for 100 MiB (previously 100 bytes).
New features
Non-integer resource requests
- You may now ask of non-integer amount of a resource. e.g. for 0.5 of GPU.
This enables resource sharing on the logical level of HyperQueue scheduler and allows to utilize remaining part the resource
by another tasks.
Job submission
- You can now specify
cleanup modes
when passingstdout
/stderr
paths to tasks. Cleanup mode decides what should
happen with the file once the task has finished executing. Currently, a single cleanup mode is implemented, which removes
the file if the task has finished successfully:
$ hq submit --stdout=out.txt:rm-if-finished /my-program
Fixes
- Fixed crash when task fails during its initialization
Artifact summary:
- hq-v0.17.0-rc1-*: Main HyperQueue build containing the
hq
binary. Download this archive to
use HyperQueue from the command line. - hyperqueue-0.17.0-rc1-*: Wheel containing the
hyperqueue
package with HyperQueue Python
bindings.