forked from SchedMD/slurm
-
Notifications
You must be signed in to change notification settings - Fork 1
/
RELEASE_NOTES
277 lines (240 loc) · 13.4 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
RELEASE NOTES FOR SLURM VERSION 14.11
8 October 2014
IMPORTANT NOTE:
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
The 14.11 slurmdbd will work with Slurm daemons of version 2.6 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do. Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line
innodb_buffer_pool_size=64M
under the [mysqld] reference in the my.cnf file and restarting the mysqld.
This is needed when converting large tables over to the new database schema.
Slurm can be upgraded from version 2.6 or 14.03 to version 14.11 without loss of
jobs or other state information. Upgrading directly from an earlier version of
Slurm will result in loss of state information.
HIGHLIGHTS
==========
-- Added job array data structure and removed 64k array size restriction.
-- Added support for reserving CPUs and/or memory on a compute node for system
use.
-- Added support for allocation of generic resources by model type for
heterogeneous systems (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of
any type).
-- Added support for non-consumable generic resources that are limited, but
can be shared between jobs.
-- Added support for automatic job requeue policy based on exit value.
-- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The
lua script no longer needs to explicitly load meta-tables, but information
is available directly using names slurm.reservations, slurm.jobs,
slurm.log_info, etc. Also, the job_submit.lua script is reloaded when
updated without restarting the slurmctld daemon.
-- Eliminate native Cray specific port management. Native Cray systems must
now use the MpiParams configuration parameter to specify ports to be used
for communications. When upgrading Native Cray systems from version 14.03,
all running jobs should be killed and the switch_cray_state file (in
SaveStateLocation of the nodes where the slurmctld daemon runs) must be
explicitly deleted.
RPMBUILD CHANGES
================
CONFIGURATION FILE CHANGES (see man appropriate man page for details)
=====================================================================
-- Modify etc/cgroup.release_common.example to set specify full path to the
scontrol command. Also find cgroup mount point by reading cgroup.conf file.
-- Added SchedulerParameters options of bf_yield_interval and bf_yield_sleep
to control how frequently and for how long the backfill scheduler will
relinquish its locks.
-- To support larger numbers of jobs when the StateSaveDirectory is on a
file system that supports a limited number of files in a directory, add a
subdirectory called "hash.#" based upon the last digit of the job ID.
-- Added GRES type (e.g. model name) and "non_consume" fields for resources
that are limited, but can be shared between jobs.
-- Modify AuthInfo configuration parameter to accept credential lifetime
option.
-- Added ChosLoc configuration parameter in slurm.conf (Chroot OS tool
location).
-- Added MemLimitEnforce configuration parameter in slurm.conf (Used to disable
enforcement of memory limits)
-- Added PriorityParameters configuration parameter in slurm.conf (String used
to hold configuration information for the PriorityType plugin).
-- Added RequeueExit and RequeueExitHold configuration parameter in slurm.conf
(Defines job exit codes which trigger a job being requeued and/or held).
-- Add SelectTypeParameters option of CR_PACK_NODES to pack a job's tasks
tightly on its allocated nodes rather than distributing them evenly across
the allocated nodes.
-- Added PriorityFlags option of Calculate_Running to continue recalculating
the priority of running jobs.
-- Add new node configuration parameters CoreSpecCount, CPUSpecList and
MemSpecLimit which support the reservation of resources for system use
with Linux cgroup.
-- Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This
allows jobs to use specialized resources on nodes allocated to them if the
job designates --core-spec=0.
-- Add new SchedulerParameters option of build_queue_timeout to throttle how
much time can be consumed building the job queue for scheduling.
-- Added HealthCheckNodeState option of "cycle" to cycle through the compute
nodes over the course of HealthCheckInterval rather than running all at
the same time.
-- Added CpuFreqDef configuration parameter in slurm.conf to specify the
default CPU frequency and governor to be set at job end.
-- Add RoutePlugin with route/default and route/topology implementations to
allow messages to be forwarded through the switch network defined in
the topology.conf file for TopologyPlugin=topology/tree.
-- Add DebugFlags=Route to allow debugging of RoutePlugin.
-- Added SchedulerParameters options of bf_max_job_array_resv to control how
many tasks of a job array should have resources reserved for them.
-- Add ability to include other files in slurm.conf based upon the ClusterName.
-- Add SchedulerParameters option of pack_serial_at_end to put serial jobs at
the end of the available nodes rather than using a best fit algorithm.
DBD CONFIGURATION FILE CHANGES (see "man slurmdbd.conf" for details)
====================================================================
-- Added DebugFlags
COMMAND CHANGES (see man pages for details)
===========================================
-- Improve qsub wrapper support for passing environment variables.
-- Modify sdiag to report Slurm RPC traffic by type, count and time consumed.
-- Enable display of nodes anticipated to be used for pending jobs by squeue,
sview or scontrol.
-- Modify squeue --start option to print the nodes expected to be used for
pending job (in addition to expected start time, etc.).
-- Add srun --cpu-freq options to set the CPU governor (OnDemand, Performance,
PowerSave or UserSpace).
-- Added squeue -O/--Format option that makes all job and step fields available
for printing.
-- Add "CPUs" count to output of "scontrol show step".
-- Add job "reboot" option for Linux clusters. This invokes the configured
RebootProgram to reboot nodes allocated to a job before it begins execution.
-- Added squeue -O/--Format option that makes all job and step fields available
for printing.
-- Add "CPUs" count to output of "scontrol show step".
-- Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached
90% of time limit), TIME_LIMIT_80 (reached 80% of time limit), and
TIME_LIMIT_50 (reached 50% of time limit). Applies to salloc, sbatch and
srun commands.
-- Added srun --export option to set/export specific environment variables.
-- Scontrol modified to print separate error messages for job arrays with
different exit codes on the different tasks of the job array. Applies to
job suspend and resume operations.
-- Add node state string suffix of "$" to identify nodes in maintenance
reservation or scheduled for reboot. This applies to scontrol, sinfo,
and sview commands.
-- Enable scontrol to clear a nodes's scheduled reboot by setting its state
to "RESUME".
-- Added squeue -P/--priority option that can be used to display pending jobs
in the same order as used by the Slurm scheduler even if jobs are submitted
to multiple partitions (job is reported once per usable partition).
-- Add sbatch job array option to limit the number of simultaneously running
tasks from a job array (e.g. "--array=0-15%4").
-- Removed --cpu_bind from sbatch and salloc. It just seemed to cause
confusion and wasn't ever handled in the allocation. A user can now only
specify the option with srun.
-- Modify scontrol job operations to accept comma delimited list of job IDs.
Applies to job update, hold, release, suspend, resume, requeue, and
requeuehold operations.
-- Added ability for "scontrol update" to references jobs by JobName (and
filtered optionally by UserID).
-- Add support for an advanced reservation start time that remains constant
relative to the current time. This can be used to prevent the starting of
longer running jobs on select nodes for maintenance purpose. See the
reservation flag "TIME_FLOAT" for more information.
-- Added "scontrol write config" option to save a copy of the current
configuration in a file containing a time stamp.
-- Added "sacctmgr reconfigure" option to cause slurmdbd to read current
configuration.
OTHER CHANGES
=============
-- Add job "reboot" option for Linux clusters. This invokes the configured
RebootProgram to reboot nodes allocated to a job before it begins execution.
-- In the job_submit plugin: Remove all slurmctld locks prior to job_submit()
being called for improved performance. If any slurmctld data structures are
read or modified, add locks directly in the plugin.
-- Cray MPMD (Multiple-Program Multiple-Data) support completed.
API CHANGES
===========
Changed members of the following structs
========================================
-- Changed the following fields in struct front_end_info:
node_state change from 16-bits to 32-bits
-- Changed the following fields in struct node_info:
node_state change from 16-bits to 32-bits
-- Changed the following fields in struct slurm_update_front_end_msg:
node_state change from 16-bits to 32-bits
-- Changed the following fields in struct slurm_update_node_msg:
node_state change from 16-bits to 32-bits
Added the following struct definitions
======================================
-- Added the following fields to struct stats_info_response_msg:
rpc_type_size, rpc_type_id, rpc_type_cnt, rpc_type_time,
rpc_user_size, rpc_user_id, rpc_user_cnt, rpc_user_time.
-- Added the following fields to struct job_descriptor:
job_id_str
-- Added the following fields to struct job_info:
array_bitmap, array_max_tasks, array_task_str, reboot, sched_nodes
-- Added the following fields to struct node_info:
gres_drain and gres_used
core_spec_cnt, cpu_spec_list, mem_spec_limit
-- Added the following fields to struct slurm_ctl_conf:
chos_loc, cpu_freq_def, layouts, mem_limit_enforce, priority_params,
requeue_exit, requeue_exit_hold, route_plugin, srun_port_range,
use_spec_resources
-- Added the following fields to struct suspend_msg:
job_id_str
-- Added the following fields to struct slurm_step_ctx_params_t:
profile
Added the following struct definitions
======================================
job_array_resp_msg_t - Job array response data structure
Changed the following enums and #defines
========================================
CPU_FREQ_* - Identification of CPU governors to use
DEBUG_FLAG_* - Many new DebugFlag values defined
HEALTH_CHECK_CYCLE - Cycle through nodes for health check rather than trying
to run health check in parallel on large numbers of nodes
KILL_STEPS_ONLY - Do not signal batch script
MAIL_JOB_TIME* - Email event triggers based upon job's run time relative to
its time limit
PRIORITY_FLAGS_* - New job priority calculation options
RESERVE_FLAG_TIME_FLOAT - Identify a reservation with a start time that is
relative to the current time
WAIT_ASSOC_*, WAIT_QOS_* - Many new job "reason" flags added to better identify
why a job is pending rather than running. This includes a
detailed identify of specific association and QOS limits.
Added the following API's
=========================
slurm_free_job_array_resp() - Free job array RPC responses
slurm_kill_job2() - Similar to slurm_kill_job(), but supports job arrays
slurm_kill_job_step2()- Similar to slurm_kill_job_step(), but supports job arrays
slurm_requeue2() - Similar to slurm_requeue(), but supports job arrays
slurm_resume2() - Similar to slurm_resume(), but supports job arrays
slurm_suspend2() - Similar to slurm_suspend(), but supports job arrays
slurm_update_job2() - Similar to slurm_update_job(), but supports job arrays
slurm_write_ctl_conf() - write the contents of slurm control configuration
message as loaded using slurm_load_ctl_conf() to a file
Changed the following API's
============================
slurm_set_debugflags() - Debug flags argument changed from 32-bit to 64-bit
slurm_signal_job_step() - Signal value changed from 16-bit to 32-bit
DBD API Changes
===============
Changed members of the following structs
========================================
Added the following struct definitions
======================================
-- Added the following fields to struct slurmdb_association_rec:
assoc_next, assoc_next_id (for hash table)
-- Added the following fields to struct slurmdb_job_rec_t:
array_max_tasks, array_task_str, alloc_gres, req_gres, used_gres, resv_name
-- Added the following fields to struct slurmdb_reservation_rec_t:
array_job_id, array_task_id
-- Added the following fields to struct slurmdb_select_step_t:
array_task_id
-- Added the following fields to struct slurmdb_qos_rec_t:
min_cpus_pj
Added the following enums and #defines
========================================
Added the following API's
=========================
slurmdb_reconfig() - Reconfigure the slurmdbd (re-read the configuration file)