-
Notifications
You must be signed in to change notification settings - Fork 295
/
Copy pathappendix.asciidoc
447 lines (298 loc) · 15.9 KB
/
appendix.asciidoc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
// to display images directly on GitHub
ifdef::env-github[]
:encoding: UTF-8
:lang: en
:doctype: book
:toc: left
:imagesdir: ../images
endif::[]
////
This file is part of the PacketFence project.
See PacketFence_Clustering_Guide.asciidoc
for authors, copyright and license information.
////
//== Appendix
=== Glossary
* 'Alive quorum': An alive quorum is when more than 50% of the servers of the cluster are online and reachable on the network (pingable). This doesn't imply they offer service, but only that they are online on the network.
* 'Hard-shutdown': A hard shutdown is when a node or a service is stopped without being able to go through a proper exit cleanup. This can occur in the case of a power outage, hard reset of a server or `kill -9` of a service.
* 'Management node/server': The first server of a PacketFence cluster as defined in `/usr/local/pf/conf/cluster.conf`.
* 'Node': In the context of this document, a node is a member of the cluster while in other PacketFence documents it may represent an endpoint.
=== Database via ProxySQL or haproxy-db
In PacketFence 12.0, proxysql became the default way for PacketFence services to obtain their connection to a database member. ProxySQL has the ability to split reads and writes to different members which offers greater performance and scalability.
If you suspect that using ProxySQL causes issues in your deployment, you can revert back to using haproxy-db by changing `database.port` in `conf/pf.conf` to `3306`.
Once that is changed on one of your cluster members, propagate your change using:
[source,bash]
----
/usr/local/pf/bin/cluster/sync --as-master
/usr/local/pf/bin/pfcmd configreload hard
----
And restart PacketFence on all your cluster members:
[source,bash]
----
/usr/local/pf/bin/pfcmd service pf restart
----
Additionally, you could change pfconfig's configuration to use haproxy-db as well although its usage of the database is extremelly light. Still, if you want to change it for pfconfig, edit `conf/pconfig.conf` and change `mysql.port` to `3306`. After doing this change, restart pfconfig using `systemctl restart packetfence-config`. Note that this change must be done on all cluster members.
=== IP addresses in a cluster environment
==== DHCP and DNS services
In registration and isolation networks, each cluster member acts as a DHCP
server. DNS configuration sent through DHCP contains physical IP address of
each cluster member unless you enabled the option 'pfdns on VIP only' in
'System configuration -> Cluster'
==== SNMP clients
If you use SNMP in a cluster environment, you will need to allow physical IP
addresses of **all** cluster members to query your network devices (switches,
WiFi controllers, etc.).
VIP address of the cluster doesn't need to be allowed in your network devices.
==== Disconnect and Change-of-Authorization (CoA) packets
Disconnect and Change-of-Authorization packets are sent from VIP address of RADIUS load-balancer.
You only need to allow this IP address in your network devices.
=== Performing an upgrade on a cluster
NOTE: This guide only covers upgrading from PacketFence 11.0.0 or above.
CAUTION: Performing a live upgrade on a PacketFence cluster is not a straightforward operation and should be done meticulously.
In this procedure, the 3 nodes will be named A, B and C and they are in this order in [filename]`cluster.conf`. When we referenced their hostnames, we speak about hostnames in [filename]`cluster.conf`.
==== Backups
Re-importable backups will be taken during the upgrade process. We highly encourage you to perform snapshots of all the virtual machines prior to the upgrade if possible.
==== Disabling the auto-correction of configuration
The PacketFence clustering stack has a mechanism that allows configuration conflicts to be handled accross the servers. This will come in conflict with your upgrade, so you should disable it.
In order to do so, go in _Configuration->System Configuration->Maintenance_ and disable the _Cluster Check_ task.
Once this is done, restart `pfcron` on all nodes using:
[source,bash]
----
/usr/local/pf/bin/pfcmd service pfcron restart
----
==== Disabling galera-autofix
You should disable the `galera-autofix` service in the configuration to disable the automated resolution of cluster issues during the upgrade.
In order to do so, go in _Configuration->System Configuration->Services_ and disable the `galera-autofix` service.
Once this is done, stop `galera-autofix` service on *all* nodes using:
[source,bash]
----
/usr/local/pf/bin/pfcmd service galera-autofix updatesystemd
/usr/local/pf/bin/pfcmd service galera-autofix stop
----
==== Detaching and upgrading node C
In order to be able to work on node C, we first need to stop all the
PacketFence application services on it:
[source,bash]
----
/usr/local/pf/bin/pfcmd service pf stop
----
IMPORTANT: `packetfence-config` should stay started in order to run `/usr/local/pf/bin/cluster/node` commands.
In the next following steps, you will be upgrading PacketFence on node C.
===== Detach node C from the cluster
First, we need to tell A and B to ignore C in their cluster configuration. In order to do so, execute the following command **on A and B** while changing `node-C-hostname` with the actual hostname of node C:
[source,bash]
----
/usr/local/pf/bin/cluster/node node-C-hostname disable
----
Once this is done proceed to restart the following services on nodes A and B **one at a time**. This will cause service failure during the restart on node A
[source,bash]
----
/usr/local/pf/bin/pfcmd service radiusd restart
/usr/local/pf/bin/pfcmd service pfdhcplistener restart
/usr/local/pf/bin/pfcmd service haproxy-admin restart
/usr/local/pf/bin/pfcmd service haproxy-db restart
/usr/local/pf/bin/pfcmd service proxysql restart
/usr/local/pf/bin/pfcmd service haproxy-portal restart
/usr/local/pf/bin/pfcmd service keepalived restart
----
Then, we should tell C to ignore A and B in their cluster configuration. In order to do so, execute the following commands on node C while changing `node-A-hostname` and `node-B-hostname` by the hostname of nodes A and B respectively.
[source,bash]
----
/usr/local/pf/bin/cluster/node node-A-hostname disable
/usr/local/pf/bin/cluster/node node-B-hostname disable
----
Now restart `packetfence-mariadb` on node C:
[source,bash]
----
systemctl restart packetfence-mariadb
----
NOTE: From this moment on, you will lose the configuration changes and data changes that occur on nodes A and B.
The commands above will make sure that nodes A and B will not be forwarding requests to C even if it is alive. Same goes for C which won't be sending traffic to A and B. This means A and B will continue to have the same database informations while C will start to diverge from it when it goes live. We'll make sure to reconcile this data afterwards.
===== Upgrade node C
From that moment node C is in standalone for its database. We can proceed to update the packages, configuration and database schema.
In order to do so, <<PacketFence_Installation_Guide.asciidoc#_automation_of_upgrades,apply the upgrade process described here>> **on node C only**.
===== Check upgrade on node C
Prior to migrating the service on node C, it is advised to run a checkup of your configuration to validate your upgrade. In order to do so, perform:
[source,bash]
----
systemctl start packetfence-proxysql
/usr/local/pf/bin/pfcmd checkup
----
Review the checkup output to ensure no errors are shown. Any 'FATAL' error will prevent PacketFence from starting up and should be dealt with immediately.
===== Stop services on nodes A and B
Next, stop all application services on node A and B:
* Stop PacketFence services:
+
[source,bash]
----
/usr/local/pf/bin/pfcmd service pf stop
----
* Stop database:
+
[source,bash]
----
systemctl stop packetfence-mariadb
----
IMPORTANT: `packetfence-config` should stay started in order to run `/usr/local/pf/bin/cluster/node` commands.
===== Start service on node C
Now, start the application service on node C using the instructions provided
in
<<PacketFence_Upgrade_Guide.asciidoc#_restart_packetfence_services,Restart PacketFence services section>>.
==== Validate migration
You should now have full service on node C and should validate that all functionnalities are working as expected. Once you continue past this point, there will be no way to migrate back to nodes A and B in case of issues other than to use the snapshots taken prior to the upgrade.
===== If all goes wrong
If your migration to node C goes wrong, you can fail back to nodes A and B by stopping all services on node C and starting them on nodes A and B
.On node C
[source,bash]
----
systemctl stop packetfence-mariadb
/usr/local/pf/bin/pfcmd service pf stop
----
.On nodes A and B
[source,bash]
----
systemctl start packetfence-mariadb
/usr/local/pf/bin/pfcmd service pf start
----
Once you are feeling confident to try your failover to node C again, you can do the exact opposite of the commands above to try your upgrade again.
===== If all goes well
If you are happy about the state of your upgrade on node C, you can move on to upgrading the other nodes.
.On node A
[source,bash]
----
/usr/local/pf/bin/cluster/node node-B-hostname disable
----
.On node B
[source,bash]
----
/usr/local/pf/bin/cluster/node node-A-hostname disable
----
.On nodes A and B
[source,bash]
----
export UPGRADE_CLUSTER_SECONDARY=yes
systemctl restart packetfence-mariadb
----
Then, <<PacketFence_Installation_Guide.asciidoc#_automation_of_upgrades,apply the upgrade process described here>> **on nodes A and B**.
NOTE: It is important that you run the upgrade commands in the same shell you ran your `export` so that the environment variable is properly taken into consideration when the upgrade script executes.
===== Configuration synchronisation
You should now sync the configuration by running the following **on nodes A and B**
[source,bash]
----
/usr/local/pf/bin/cluster/sync --from=192.168.1.5 --api-user=packet --api-password=anotherMoreSecurePassword
/usr/local/pf/bin/pfcmd configreload hard
----
Where:
* `_192.168.1.5_` is the management IP of node C
* `_packet_` is the webservices username (_Configuration->Webservices_)
* `_anotherMoreSecurePassword_` is the webservices password (_Configuration->Webservices_)
==== Reintegrating nodes A and B
===== Optional step: Cleaning up data on node C
When you will re-establish a cluster using node C in the steps below, your environment will be set in read-only mode for the duration of the database sync (which needs to be done from scratch).
This can take from a few minutes to an hour depending on your database size.
We highly suggest you delete data from the following tables if you don't need it:
* `radius_audit_log`: contains the data in _Auditing->RADIUS Audit Logs_
* `ip4log_history`: Archiving data for the IPv4 history
* `ip4log_archive`: Archiving data for the IPv4 history
* `locationlog_history`: Archiving data for the node location history
You can safely delete the data from all of these tables without affecting the functionnalities as they are used for reporting and archiving purposes. Deleting the data from these tables can make the sync process considerably faster.
In order to truncate a table:
[source,bash]
----
mysql -u root -p pf
MariaDB> truncate TABLE_NAME;
----
===== Elect node C as database master
NOTE: The steps in next sections will cause brief service disruptions
Now that all the members are ready to reintegrate the cluster, run the following commands on **all cluster members**
[source,bash]
----
/usr/local/pf/bin/cluster/node node-A-hostname enable
/usr/local/pf/bin/cluster/node node-B-hostname enable
/usr/local/pf/bin/cluster/node node-C-hostname enable
----
Now, stop `packetfence-mariadb` on node C, regenerate the MariaDB configuration and start it as a new master:
[source,bash]
----
systemctl stop packetfence-mariadb
/usr/local/pf/bin/pfcmd generatemariadbconfig
systemctl set-environment MARIADB_ARGS=--force-new-cluster
systemctl restart packetfence-mariadb
----
You should validate that you are able to connect to the MariaDB database even
though it is in read-only mode using the MariaDB command line:
[source,bash]
----
mysql -u root -p pf -h localhost
----
If its not, make sure you check the MariaDB log
([filename]`/usr/local/pf/logs/mariadb.log`)
===== Sync nodes A and B
On each of the servers you want to discard the data from, stop `packetfence-mariadb`, you must destroy all the data in `/var/lib/mysql` and start `packetfence-mariadb` so it resyncs its data from scratch.
[source,bash]
----
systemctl stop packetfence-mariadb
rm -fr /var/lib/mysql/*
systemctl start packetfence-mariadb
----
Should there be any issues during the sync, make sure you look into the MariaDB log ([filename]`/usr/local/pf/logs/mariadb.log`)
Once both nodes have completely synced (try connecting to it using the MariaDB
command line).
Once you have confirmed all members are joined to the MariaDB cluster, perform the following **on node C**
[source,bash]
----
systemctl stop packetfence-mariadb
systemctl unset-environment MARIADB_ARGS
systemctl start packetfence-mariadb
----
===== Start nodes A and B
You can now safely start PacketFence on nodes A and B using the instructions
provided in
<<PacketFence_Upgrade_Guide.asciidoc#_restart_packetfence_services,Restart
PacketFence services section>>.
`haproxy-admin` service need to be restarted manually on both nodes
after all services have been restarted:
[source,bash]
----
/usr/local/pf/bin/pfcmd service haproxy-admin restart
----
==== Restart node C
Now, you should restart PacketFence on node C using the instructions provided
in
<<PacketFence_Upgrade_Guide.asciidoc#_restart_packetfence_services,Restart
PacketFence services section>>. So it becomes aware of its peers again.
You should now have full service on all 3 nodes using the latest version of PacketFence.
==== Reactivate the configuration conflict handling
Now that your cluster is back to a healthy state, you should reactivate the configuration conflict resolution.
In order to do so, go in _Configuration->System Configuration->Maintenance_ and re-enable the _Cluster Check_ task.
Once this is done, restart `pfcron` on all nodes using:
[source,bash]
----
/usr/local/pf/bin/pfcmd service pfcron restart
----
==== Reactivate galera-autofix
You now need to reactivate and restart the `galera-autofix` service so that it's aware that all the members of the cluster are online again.
In order to do so, go in _Configuration->System Configuration->Services_ and re-enable the `galera-autofix` service.
Once this is done, restart `galera-autofix` service on *all* nodes using:
[source,bash]
----
/usr/local/pf/bin/pfcmd service galera-autofix updatesystemd
/usr/local/pf/bin/pfcmd service galera-autofix restart
----
=== MariaDB Galera cluster troubleshooting
==== Maximum connections reached
In the event that one of the 3 servers reaches the maximum amount of
connections (defaults to 1000), this will deadlock the Galera cluster
synchronization. In order to resolve this, you should first increase
`database_advanced.max_connections`, then stop `packetfence-mariadb` on all 3
servers, and follow the steps in the section <<_no_more_database_service>>
of this document. Note that you can use any of the database servers as your
source of truth.
==== Investigating further
The limit of 1000 connections is fairly high already so if you reached the maximum number of connections, this might indicate an issue with your database cluster. If this issue happens often, you should monitor the active connections and their associated queries to find out what is using up your connections.
You can monitor the active TCP connections to MariaDB using this command and then investigate the processes that are connected to it (last column):
# netstat -anlp | grep 3306
You can have an overview of all the current connections using the following MariaDB query:
MariaDB> select * from information_schema.processlist;
And if you would like to see only the connections with an active query:
MariaDB> select * from information_schema.processlist where Command!='Sleep';