Use dynamic HostKeyAlgorithms SSH option for unknown hosts #798
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes
Bad protocol 2 host key algorithms
error like this example. Should also resolve #784.Background
In recent
sshd
changes (#744), b24f074 made the server only offer secure keys based on ed25519 and rsa. However, many SSH clients default to asking for an ecdsa-based key (less secure). These conditions would result in this experience for users:server.yml
configures the server to no longer be willing to use ecdsa keyTo avoid the connection failure due to changed host key, b24f074 also added the HostKeyAlgorithms option to Ansible's
ssh_args
(inansible.cfg
), causing the very first connection to a server to request an ed25519 key that would never need to change. It appeared that this would prevent the changed host key problem for new servers.The problem
As of OpenSSH 6.5p1 (Jan 2014), the HostKeyAlgorithms option included ed25519 (ssh_config). Although older OSs like Ubuntu 14.04 (April 2014) include a new enough version (OpenSSH 6.6p1) to handle ed25519, some Trellis users are on OSs with older OpenSSH and the sentiment seems to be that we don't want to require them to update. For example, macOS 10.10.5 (Aug 2015) uses OpenSSH 6.2p2 (May 2013). OpenSSH for these latter users will fail if the HostKeyAlgorithms option includes ed25519:
Bad protocol 2 host key algorithms
.Proposed solution
This PR enables Trellis to...
Users may disable the feature altogether by defining this somewhere in
group_vars
:The feature will disable itself if users specify the
--extra-ssh-args
CLI option.Trellis will display this forthright message on the very first connection to a server
(and NOT on subsequent connections):
Implementation notes
ansible_ssh_extra_args
is an Ansible magic var, always defined as empty string, or contains content of the--ssh-extra-args
CLI option. This PR optionally loads the var with the desired HostKeyAlgorithms.ssh-keygen -F <hostname>
can be used to check whether a host is inknown_hosts
and factors in to the helper vars inroles/connection/defaults/main.yml
.We need to know which hosts to check for status as known/unknown. Consider this example inventory file:
Ansible's
ansible_host
magic var will include the most specific info it can find in the inventory file, e.g., the actual IP from the example above. Theansible_host_known
helper var in this PR just runs thessh-keygen -F
check onansible_host
, and is a boolean containing true/false.Now consider an inventory file like the one above, but without the line indicating the IP. Suppose the IP is indicated instead in the local machine's ssh config like this:
Ansible can still connect to
aliasname
because the SSH client sorts out the IP. However, theansible_host
magic var will equalaliasname
(not the IP). Theknown_hosts
file will only have the IP, notaliasname
, so thessh-keygen -F
on theansible_host
will suggest the host is unknown when it could in fact be known. If only we could get the IP out of the SSH config file...This
ssh_config_host
helper var in this PR checks for theansible_host
in the ssh config, then thessh_config_host_known
boolean runs thessh-kegen -F
check on the returnedssh_config_host
value. This is the same logic as theansible_host_known
var, but this time applied to the hostname from the SSH config file.To zoom back out conceptually, the point of all this host checking is that we only want to specify HostKeyAlgorithms if the machine doesn't already have a key for the host. If the local machine already has an acceptable key but we specify a different HostKeyAlgorithm type, it will cause a host key change error.
So, we check whether there is a key for the host as per the Ansible inventory (
ansible_host_known
) and as per the ssh config (ssh_config_host_known
). You'll notice the condition that both these booleans must befalse
for theset_fact
task to run (the task that loads HostKeyAlgorithms intoansible_ssh_extra_args
).Useful for testing
Q & A
Q. How does this affect...
A. Only difference from current master is that users will no longer get
Bad protocol 2 host key algorithms
error.A. No change. If local machine already has ed25519 or rsa key, that key will continue to be used.
A. No change except users will no longer get
Bad protocol 2 host key algorithms
error.Q. Why not just change to rsa-based for everyone, e.g., with an SSH config entry with
Host *
?HostKeyAlgorithms
would require all hosts to send the rsa-type key, causing a host key change for any known_hosts that use a different key type (e.g., A LOT of host key changes for users' hosts not even related to Trellis).Q. Will some users' OpenSSH versions be too old for the rsa-based algorithms?
A.
ssh-rsa-cert-v01@openssh.com
appears in the ssh_config man page as far back as OpenSSH_5.9p1 (e.g., on macOS 10.8) and in the codebase for OpenSSH_5.6p1 (e.g., used in macOS 10.7, released July 2011). (Assuming these macOS and openssh pairings are correct.) Thessh-rsa
is a predecessor and appears in all of the above. Of course, macOS isn't the standard, but the only issue reports have been from macOS 10.10 users. In any case, I doubt we want to support users with OpenSSH versions older than this.Q. Could the
ssh -G
(available only in OpenSSH 6.8+) or thessh-keygen -F
fail on some systems?A. The helper vars in
defaults
send most output to/dev/null 2>&1
and typically use an or||
condition to avoid failing on any non-zero exit status.Q. Why not move these OpenSSH-related tasks into the
connection
role next to this new SSH-relatedset_fact
task?A. Because those tasks rely on
sshd
role vars available only in the next play, the play that actually has thesshd
role. In addition, the connection role also runs indeploy.yml
, which doesn't have thesshd
role nor its vars.