Skip to content
This repository has been archived by the owner on Nov 8, 2021. It is now read-only.

Support for systemd, SELinux, and SUSE #99

Merged
merged 7 commits into from
Dec 7, 2017

Conversation

KusabiSensei
Copy link
Contributor

The following is an overview of the changes that are in this PR:

  • Support multiple init platforms (specifically systemd, upstart, and legacy SysV init scripts)
  • Support SUSE and other distros where USERGROUPS_ENAB is set to no in /etc/login.defs
  • Support SELinux enforcing distros (like CentOS and RHEL) by appropriately setting the NIS boolean (to allow outbound communication to a remote location to fetch user info)
  • Add a small check into the install.sh script to stop it from spamming /etc/ssh/sshd_config with the configuration directives when they already exist (e.g. if a user were to run install.sh twice)

The systemd changes should also search for both of the styles of naming the sshd service. On Ubuntu 16.04 LTS (and likely others in the Debian family), the service is named ssh.service. Everywhere else that I tested, it is named sshd.service.
Helpfully, Canonical puts an alias line in their unit file, but I feel it's more correct to search for the unit file and then use that to make the correct call.

For SUSE, this is just a matter of explicitly telling useradd(8) to create a group for each user. The option has no effect on other distros (as that was default behavior), but it's probably better to be explicit with the call.

For SELinux, we simply check if we have SELinux utilities installed (which are required if you use the SELinux kernel modules), and if they exist, call the selinuxenabled command to get if SELinux is explicitly disabled.
If it's disabled (as on Amazon Linux), we ignore setting the boolean as it would have no effect.
If it is enabled (and it doesn't matter if SELinux is in Permissive or Enforcing mode), we set the boolean so we don't clog up the audit logs (Permissive mode), or get blocked (Enforcing mode).

This has been tested on the following AMIs (All AMI IDs are for region us-east-2):

  • RHEL 7.4 - RHEL-7.4_HVM_GA-20170808-x86_64-2-Hourly2-GP2 (ami-cfdafaaa)
  • CentOS Linux 7 x86_64 HVM EBS 1703_01 (ami-9cbf9bf9)
  • Ubuntu 16.04 LTS - ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20171121.1 (ami-82f4dae7)
  • SUSE 12 - suse-sles-12-sp3-v20171121-hvm-ssd-x86_64 (ami-36f5db53)
  • Amazon Linux - amzn-ami-hvm-2017.09.1.20171120-x86_64-gp2 (ami-15e9c770)

This should resolve the following open issues:
#84

* Add test to determine init system for sshd restart

In the install file, I've added a simple test to determine what init
system is currently on the system, and to select the correct restart
command for sshd that is required.

This is tested on the following OSes in EC2:
* Amazon Linux 2017.09 (using upstart)
* Ubuntu 16.04.3 LTS (using systemd)
* SUSE Enterprise Server 12 SP3 (using systemd)
* RHEL 7.3 (using systemd)

* Hush the stderr message when systemd doesn't exist

This is to ensure that the output for a user is the same as it is
when using the standard codebase.

Message is successfully suppressed on Amazon Linux 2017.09

* Add useradd(1) arg to force user group creation

By default on most distros, `USERGROUPS_ENAB` is set to `yes` which
causes `useradd(1)` to create a group for a user that is added.

There is an implicit dependency on line 179 of the `import_users.sh`
script. I've altered the `useradd(1)` argument set to include
`-U/--user-group` to force this on systems where `USERGROUPS_ENAB`
is set to `no`

This is needed to support SUSE Linux Enterprise Server on EC2.
The `install.sh` script currently checks if there is a line that reads
`AuthorizedKeysCommand none` and then changes it. Old behaviour was to
simply append the line on the end of the file is this line didn't exist.

There is now a check to see if the correct line that we want already
exists, so we don't spam the end of the configuration file.
If we have the `getenforce` command available to us, and we find that
it returns the value `Enforcing` (meaning SELinux will block our access
to AWS when called by `sshd`), we enable the `nis_enabled` boolean
persistently in SELinux to inform SELinux that it should expect outbound
access from sshd and other login programs (like PAM) to remote servers
to get user account information.

This allows `sshd` to call the script to get the user's public key from IAM.

I also cleaned up the `systemd` conditional in the script to use the
simpler way to check if commands exist.
Chaining the command check with the return value failed miserably.

Reverting this to be normal again.
We use `which(1)` to determine if a command is available to us. According
to the man page, `which(1)` will return a return code `1` if the command is
not found. Since the script is run with `-e` as an option in the shebang,
this causes the script to exit prematurely.

I've wrapped the `which(1)` calls in `set +e/set -e` wrappers so that the
script continues gracefully.
On systems where `which(1)` doesn't find a given command, it will return exit
code `1`. This will cause the script to abort under the `-e` option in the
shebang. I've changed the pipeline to capture the return value of which on the
other side of a logical OR, which does work correctly, and thus the pipeline
gets a 0 return code (making `-e` happy), and I get the return value which
tells me whether or not the command exists.

Which makes me happy.
I forgot about shortcircuit evaluation. So I've added the success code
in the variable before calling `which(1)`. This means that if the command
is found, the 0 retval will get used in the test. The only time that it
would change that to a non-zero status would be if something wasn't found.

Like before, the whole command pipeline will return success, so `-e` stays
happy as a clam.
@michaelwittig michaelwittig merged commit 73bd78f into widdix:master Dec 7, 2017
@michaelwittig
Copy link
Contributor

thanks!

@michaelwittig
Copy link
Contributor

It looks like the changes broke the tests we have in place. I will have a look at this soon.

@KusabiSensei
Copy link
Contributor Author

Sorry about that! I had been testing manually on EC2 instances that I had spun up, since I was trying to get something working that I was going to deploy internally in my company's cluster via updated AMIs. I hadn't thought of running the Maven test suite with CloudFormation.

If you can share the transcript, I'll do what I can to help resolve the test suite issues.

@michaelwittig
Copy link
Contributor

michaelwittig commented Dec 10, 2017

So we currently have two integration tests that basically setup a CloudFortmation stack based on showcase.yaml. The Amazon Linux one is still working. For the Ubuntu stack, the ssh login no longer works.

I setted up the stack manually, also specifying a KeyPair that AWS deploys. I can login with user ubuntu, but not with my IAM users. After running systemctl restart ssh.service it was working. So it seems that for some reasons sshd is not "reloaded" with the new config?

@KusabiSensei
Copy link
Contributor Author

That's interesting that it wasn't reloading that. What AMI-ID are you using when you spin up those ubuntu systems?

@michaelwittig
Copy link
Contributor

e.g. in us-east-1: ami-cd0f5cb6 also called ubuntu/images/hvm-ssd/ubuntu-xenial-16.04-amd64-server-20170721

based on this template: https://github.com/widdix/aws-ec2-ssh/blob/master/showcase.yaml

@KusabiSensei
Copy link
Contributor Author

OK I'll give that a try and see what happens.

ebmeierj pushed a commit to ebmeierj/aws-ec2-ssh that referenced this pull request Dec 20, 2017
* Support Multiple init daemons and service programs (widdix#1)

* Add test to determine init system for sshd restart

In the install file, I've added a simple test to determine what init
system is currently on the system, and to select the correct restart
command for sshd that is required.

This is tested on the following OSes in EC2:
* Amazon Linux 2017.09 (using upstart)
* Ubuntu 16.04.3 LTS (using systemd)
* SUSE Enterprise Server 12 SP3 (using systemd)
* RHEL 7.3 (using systemd)

* Hush the stderr message when systemd doesn't exist

This is to ensure that the output for a user is the same as it is
when using the standard codebase.

Message is successfully suppressed on Amazon Linux 2017.09

* Add useradd(1) arg to force user group creation

By default on most distros, `USERGROUPS_ENAB` is set to `yes` which
causes `useradd(1)` to create a group for a user that is added.

There is an implicit dependency on line 179 of the `import_users.sh`
script. I've altered the `useradd(1)` argument set to include
`-U/--user-group` to force this on systems where `USERGROUPS_ENAB`
is set to `no`

This is needed to support SUSE Linux Enterprise Server on EC2.

* Add check to not spam the sshd_config file

The `install.sh` script currently checks if there is a line that reads
`AuthorizedKeysCommand none` and then changes it. Old behaviour was to
simply append the line on the end of the file is this line didn't exist.

There is now a check to see if the correct line that we want already
exists, so we don't spam the end of the configuration file.

* Add SELinux support to install script

If we have the `getenforce` command available to us, and we find that
it returns the value `Enforcing` (meaning SELinux will block our access
to AWS when called by `sshd`), we enable the `nis_enabled` boolean
persistently in SELinux to inform SELinux that it should expect outbound
access from sshd and other login programs (like PAM) to remote servers
to get user account information.

This allows `sshd` to call the script to get the user's public key from IAM.

I also cleaned up the `systemd` conditional in the script to use the
simpler way to check if commands exist.

* Revert the supposedly clever way of command checks

Chaining the command check with the return value failed miserably.

Reverting this to be normal again.

* Allow which(1) to exit and the script to continue

We use `which(1)` to determine if a command is available to us. According
to the man page, `which(1)` will return a return code `1` if the command is
not found. Since the script is run with `-e` as an option in the shebang,
this causes the script to exit prematurely.

I've wrapped the `which(1)` calls in `set +e/set -e` wrappers so that the
script continues gracefully.

* Adjust which(1) pipelines to return valid exit codes

On systems where `which(1)` doesn't find a given command, it will return exit
code `1`. This will cause the script to abort under the `-e` option in the
shebang. I've changed the pipeline to capture the return value of which on the
other side of a logical OR, which does work correctly, and thus the pipeline
gets a 0 return code (making `-e` happy), and I get the return value which
tells me whether or not the command exists.

Which makes me happy.

* Adjust which(1) blocks to account for shortcircuit eval

I forgot about shortcircuit evaluation. So I've added the success code
in the variable before calling `which(1)`. This means that if the command
is found, the 0 retval will get used in the test. The only time that it
would change that to a non-zero status would be if something wasn't found.

Like before, the whole command pipeline will return success, so `-e` stays
happy as a clam.
@michaelwittig
Copy link
Contributor

Do you discovered anything? Otherwise I have to revert the change for now until we have fixed this. I want to create a new release...

@KusabiSensei
Copy link
Contributor Author

I will complete testing the showcase today. I will update here with the results.

If a change is needed, I can likely reverse the logic to make systemd the base case. Amazon Linux LTS 2 is now on systemd, so that may be a consideration.

As an aside, I am looking at seeing if a .deb package for Ubuntu is more reliable at triggering the service refresh (analogous to the RPM).

@KusabiSensei
Copy link
Contributor Author

Here's what I'm seeing at the moment. I can replicate the issue with the showcase (The "not restarting SSH" issue, #103 is different from this failure case), but manual installation using install.sh works correctly. (Both of these should be identical, however).

Following is a transcript of the expected YAML steps of fetching the file using the URL, changing mode to 0755, executing the script, and then showing the status of the sshd process having been restarted.

I'm going to see if I can get CF to restart the SSH service at the end of running the creation.

root@ip-10-76-0-10:/opt# wget https://raw.githubusercontent.com/widdix/aws-ec2-ssh/master/install.sh
--2017-12-21 14:21:19--  https://raw.githubusercontent.com/widdix/aws-ec2-ssh/master/install.sh
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.200.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.200.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5982 (5.8K) [text/plain]
Saving to: ‘install.sh’

install.sh                       100%[==========================================================>]   5.84K  --.-KB/s    in 0s      

2017-12-21 14:21:19 (117 MB/s) - ‘install.sh’ saved [5982/5982]

root@ip-10-76-0-10:/opt# ls
install.sh
root@ip-10-76-0-10:/opt# chmod 755 install.sh 
root@ip-10-76-0-10:/opt# 
root@ip-10-76-0-10:/opt# ./install.sh 
Cloning into 'aws-ec2-ssh'...
remote: Counting objects: 350, done.
remote: Compressing objects: 100% (7/7), done.
remote: Total 350 (delta 1), reused 4 (delta 1), pack-reused 342
Receiving objects: 100% (350/350), 166.58 KiB | 0 bytes/s, done.
Resolving deltas: 100% (177/177), done.
Checking connectivity... done.
root@ip-10-76-0-10:/opt# systemctl status ssh
● ssh.service - OpenBSD Secure Shell server
   Loaded: loaded (/lib/systemd/system/ssh.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2017-12-21 14:21:36 UTC; 5s ago
 Main PID: 4174 (sshd)
    Tasks: 1
   Memory: 724.0K
      CPU: 3ms
   CGroup: /system.slice/ssh.service
           └─4174 /usr/sbin/sshd -D

Dec 21 14:21:36 ip-10-76-0-10 systemd[1]: Stopping OpenBSD Secure Shell server...
Dec 21 14:21:36 ip-10-76-0-10 systemd[1]: Stopped OpenBSD Secure Shell server.
Dec 21 14:21:36 ip-10-76-0-10 systemd[1]: Starting OpenBSD Secure Shell server...
Dec 21 14:21:36 ip-10-76-0-10 sshd[4174]: Server listening on 0.0.0.0 port 22.
Dec 21 14:21:36 ip-10-76-0-10 systemd[1]: Started OpenBSD Secure Shell server.
Dec 21 14:21:36 ip-10-76-0-10 sshd[4174]: Server listening on :: port 22.
root@ip-10-76-0-10:/opt# 

@KusabiSensei
Copy link
Contributor Author

Good news. I've found a way to add the ssh service to the CloudFormation stack, so it automatically restarts it when using the showcase on both Ubuntu and Amazon Linux.

I'll run the Java test suites to confirm it.

@KusabiSensei
Copy link
Contributor Author

Looks good with the changes to the showcase YAML. I'll open an issue and PR for it.

Results :

Tests run: 2, Failures: 0, Errors: 0, Skipped: 0

@KusabiSensei KusabiSensei mentioned this pull request Dec 27, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants