This repository has been archived by the owner on Mar 19, 2021. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathRELEASE_NOTES
196 lines (164 loc) · 8.87 KB
/
RELEASE_NOTES
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
RELEASE NOTES FOR SLURM VERSION 14.11
6 July 2014
IMPORTANT NOTE:
If using the slurmdbd (Slurm DataBase Daemon) you must update this first.
The 14.11 slurmdbd will work with Slurm daemons of version 2.6 and above.
You will not need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before updating
any other clusters making use of it. No real harm will come from updating
your systems before the slurmdbd, but they will not talk to each other
until you do. Also at least the first time running the slurmdbd you need to
make sure your my.cnf file has innodb_buffer_pool_size equal to at least 64M.
You can accomplish this by adding the line
innodb_buffer_pool_size=64M
under the [mysqld] reference in the my.cnf file and restarting the mysqld.
This is needed when converting large tables over to the new database schema.
Slurm can be upgraded from version 2.6 or 14.03 to version 14.11 without loss of
jobs or other state information. Upgrading directly from an earlier version of
Slurm will result in loss of state information.
HIGHLIGHTS
==========
-- Added job array data structure and removed 64k array size restriction.
-- Added support for reserving CPUs and/or memory on a compute node for system
use.
-- Added support for allocation of generic resources by model type for
heterogenous systems (e.g. request a Kepler GPU, a Tesla GPU, or a GPU of
any type).
-- Added support for non-consumable generic resources that are limited, but
can be shared between jobs.
-- Added a Route plugin to allow messages to be forwarded using switch
topology information.
RPMBUILD CHANGES
================
CONFIGURATION FILE CHANGES (see man appropriate man page for details)
=====================================================================
-- Modify etc/cgroup.release_common.example to set specify full path to the
scontrol command. Also find cgroup mount point by reading cgroup.conf file.
-- Added SchedulerParameters options of bf_yield_interval and bf_yield_sleep
to control how frequently and for how long the backfill scheduler will
relinquish its locks.
-- To support larger numbers of jobs when the StateSaveDirectory is on a
file system that supports a limited number of files in a directory, add a
subdirectory called "hash.#" based upon the last digit of the job ID.
-- Added GRES type (e.g. model name) and "non_consume" fields for resources
that are limited, but can be shared between jobs.
-- Modify AuthInfo configuration parameter to accept credential lifetime
option.
-- Added ChosLoc configuration parameter in slurm.conf (Chroot OS tool
location).
-- Added MemLimitEnforce configuration parameter in slurm.conf (Used to disable
enforcement of memory limits)
-- Added PriorityParameters configuration parameter in slurm.conf (String used
to hold configuration information for the PriorityType plugin).
-- Added RequeueExit and RequeueExitHold configuration parameter in slurm.conf
(Defines job exit codes which trigger a job being requeued and/or held).
-- Add SelectTypeParameters option of CR_PACK_NODES to pack a job's tasks
tightly on its allocated nodes rather than distributing them evenly across
the allocated nodes.
-- Added PriorityFlags option of Calulate_Running to continue recalculating
the priority of running jobs.
-- Add new node configuration parameters CoreSpecCount, CPUSpecList and
MemSpecLimit which support the reservation of resources for system use
with Linux cgroup.
-- Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This
allows jobs to use specialized resources on nodes allocated to them if the
job designates --core-spec=0.
-- Add new SchedulerParameters option of build_queue_timeout to throttle how
much time can be consumed building the job queue for scheduling.
-- Added HealthCheckNodeState option of "cycle" to cycle through the compute
nodes over the course of HealthCheckInterval rather than running all at
the same time.
-- Added CpuFreqDef configuration parameter in slurm.conf to specify the
default CPU frequency and governor to be set at job end.
-- Add RoutePlugin with route/default and route/topology implementations to
allow messages to be forwarded through the switch network defined in
the topology.conf file for TopologyPlugin=topology/tree.
-- Add DebugFlags=Route to allow debugging of RoutePlugin.
-- Added SchedulerParameters options of bf_max_job_array_resv to control how
many tasks of a job array should have resources reserved for them.
DBD CONFIGURATION FILE CHANGES (see "man slurmdbd.conf" for details)
====================================================================
-- Added DebugFlags
COMMAND CHANGES (see man pages for details)
===========================================
-- Improve qsub wrapper support for passing environment variables.
-- Modify sdiag to report Slurm RPC traffic by type, count and time consumed.
-- Enable display of nodes anticipated to be used for pending jobs by squeue,
sview or scontrol.
-- Modify squeue --start option to print the nodes expected to be used for
pending job (in addition to expected start time, etc.).
-- Add srun --cpu-freq options to set the CPU governor (OnDemand, Performance,
PowerSave or UserSpace).
-- Added squeue -O/--Format option that makes all job and step fields available
for printing.
-- Add "CPUs" count to output of "scontrol show step".
-- Add job "reboot" option for Linux clusters. This invokes the configured
RebootProgram to reboot nodes allocated to a job before it begins execution.
-- Added squeue -O/--Format option that makes all job and step fields available
for printing.
-- Add "CPUs" count to output of "scontrol show step".
-- Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached
90% of time limit), TIME_LIMIT_80 (reached 80% of time limit), and
TIME_LIMIT_50 (reached 50% of time limit). Applies to salloc, sbatch and
srun commands.
-- Added srun --export option to set/export specific environment variables.
-- Scontrol modified to print separate error messages for job arrays with
different exit codes on the different tasks of the job array. Applies to
job suspend and resume operations.
-- Add node state string suffix of "$" to identify nodes in maintenance
reservation or scheduled for reboot. This applies to scontrol, sinfo,
and sview commands.
-- Enable scontrol to clear a nodes's scheduled reboot by setting its state
to "RESUME".
-- Added squeue -P/--priority option that can be used to display pending jobs
in the same order as used by the Slurm scheduler even if jobs are submitted
to multiple partitions (job is reported once per usable partition).
-- Add sbatch job array option to limit the number of simultaneously running
tasks from a job array (e.g. "--array=0-15%4").
-- Removed --cpu_bind from sbatch and salloc. It just seemed to cause
confusion and wasn't ever handled in the allocation. A user can now only
specify the option with srun.
-- Modify scontrol job operations to accept comma delimited list of job IDs.
Applies to job update, hold, release, suspend, resume, requeue, and
requeuehold operations.
OTHER CHANGES
=============
-- Add job "reboot" option for Linux clusters. This invokes the configured
RebootProgram to reboot nodes allocated to a job before it begins execution.
-- In the job_submit plugin: Remove all slurmctld locks prior to job_submit()
being called for improved performance. If any slurmctld data structures are
read or modified, add locks directly in the plugin.
API CHANGES
===========
Changed members of the following structs
========================================
Added the following struct definitions
======================================
-- Added the following fields to struct stats_info_response_msg:
rpc_type_size, rpc_type_id, rpc_type_cnt, rpc_type_time,
rpc_user_size, rpc_user_id, rpc_user_cnt, rpc_user_time.
-- Added the following fields to struct job_info:
reboot, sched_nodes
-- Added the following fields to struct node_info:
gres_drain and gres_used
core_spec_cnt, cpu_spec_list, mem_spec_limit
-- Added the following fields to struct slurm_ctl_conf:
chos_loc, mem_limit_enforce, priority_params
requeue_exit, requeue_exit_hold, route_plugin
Changed the following enums and #defines
========================================
-- Added #define DEBUG_FLAG_ROUTE to list of debug flags.
Added the following API's
=========================
Change the following API's
===========================
DBD API Changes
===============
Changed members of the following structs
========================================
Added the following struct definitions
======================================
Added the following enums and #defines
========================================
Added the following API's
=========================