Skip to content

Latest commit

 

History

History
87 lines (69 loc) · 3.46 KB

README.md

File metadata and controls

87 lines (69 loc) · 3.46 KB

subd

The daemon that runs on every dominated system.

This daemon continuously checksum scans the root file-system and responds to poll, fetch files and update RPC requests from the dominator. In order to have a neglibible impact on system workload, it lowers its priority (nice 15 by default), restricts itself to one CPU and automatically rate limits its I/O to be 2% of the media speed.

Status page

Subd provides a web interface on port 6969 which provides a status page, access to performance metrics and logs. If subd is running on host myhost then the URL of the main status page is http://myhost:6969/. An RPC over HTTP interface is also provided over the same port.

Startup

Subd is started at boot time, usually by one of the provided init scripts. The subd process is baby-sat by the init script; if the process dies the init script will re-start subd. It may be stopped with the command:

service subd stop

which also kills the baby-sitting init script. It may be started with the command:

service subd start

There are many command-line flags which may change the behaviour of subd but the defaults should be adequate for most deployments. Built-in help is available with the command:

subd -h

Security

RPC access is restricted using TLS client authentication. Subd expects a root certificate in the file /etc/ssl/CA.pem which it trusts to sign certificates which grant access. It also requires a certificate and key which grant it the ability to fetch files from the objectserver. These should be in the files /etc/ssl/subd/cert.pem and /etc/ssl/subd/key.pem, respectively.

If any of these files are missing, subd will refuse to start. This prevents accidental deployments without access control.

Control and debugging

The subtool utility may be used to manipulate various operating parameters of a running subd and perform RPC requests.

DisruptionManager

Disruptive updates can be controlled using an optional Disruption Manager which subd can run to request, check and cancel requests to perform a disruptive upgrade (an upgrade where a HighImpact trigger is called). This may be used to request that new work will not be scheduled on the machine and wait for existing work to complete before performing the upgrade.

The Disruption Manager is a simple tool which takes one of the following arguments:

  • cancel: cancel a request to disrupt
  • check: check whether disruptions are permitted
  • request: request to perform disruption

Regardless of the argument provided, the tool must return one of the following exit codes:

  • 0: disruption is permitted
  • 1: disruption has been requested (and acknowledged) but not yet permitted
  • 2: disruption is denied (not currently permitted)

Any other exit code is considered an error, and subd may retry again soon.

After a request to perform a disruptive upgrade, if the exit code is 1 (disruption requested and acknowledged), the request will be re-issued periodically. If however the exit code is 2 (upgrade is not permitted), the request will be re-issued more frequently.

Once a machine enters the disruption is permitted state, it must remain in that state until a cancel command is made, or more than one hour has passed since the last request is made.

The DisruptionManager may be called frequently (up to every second) by every machine in the fleet.