Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run SSM Agent on AWS by default #107

Closed
pothos opened this issue May 4, 2020 · 24 comments
Closed

Run SSM Agent on AWS by default #107

pothos opened this issue May 4, 2020 · 24 comments
Labels
kind/feature A feature request

Comments

@pothos
Copy link
Member

pothos commented May 4, 2020

Current situation
The agent system service for the AWS Session Manager is not installed by default. It can be started as a Docker service with a customized image.

Impact
Users don't have this functionality available unless they know the system very well and are ready to run the agent themselves.

Ideal future situation
The agent runs by default if there is no drawback.

Implementation options
Run it via Docker with a customized image as written here. There is an OEM package for EC2 but I recommend to include the service file not in the OEM package but in the regular /usr partition and only start it under the systemd unit condition that the kernel command line includes the OEM ID, i.e., KernelCommandLine=flatcar.oem.id=ec2 (and maybe add KernelCommandLine=coreos.oem.id=ec2).
Build the Docker image in our quay repo and tag it with a version.

Additional information
All relevant links are in the blog post linked above.

@pothos pothos added the kind/feature A feature request label May 4, 2020
@samm-git
Copy link

I am interested and happy to contribute here. I dont like option to run it via docker, as it is tool like ssh, to access host itself, not container internals. And also because agent itself are just few golang binaries, w/o any external dependencies. I been able to run agent on a host by copying files from my docker container. Below is my dockerfile to build it:

FROM golang:1.12 as builder
ARG VERSION=2.3.1205.0
RUN set -ex && apt-get install make git gcc libc-dev curl bash && \
    curl -sLO https://github.com/aws/amazon-ssm-agent/archive/${VERSION}.tar.gz && \
    mkdir -p /go/src/github.com && \
    tar xzf ${VERSION}.tar.gz && \
    mv amazon-ssm-agent-${VERSION} /go/src/github.com/amazon-ssm-agent && \
    cd /go/src/github.com/amazon-ssm-agent && \
    gofmt -w agent && make checkstyle || ./Tools/bin/goimports -w agent && \
    make build-linux

## Copy files from this dir to /opt/ssm

#COPY --from=builder /go/src/github.com/amazon-ssm-agent/bin/linux_amd64/ /usr/bin
## From this locations to etc/amazon/ssm/
#COPY --from=builder /go/src/github.com/amazon-ssm-agent/bin/amazon-ssm-agent.json.template /etc/amazon/ssm/amazon-ssm-agent.json
#COPY --from=builder /go/src/github.com/amazon-ssm-agent/bin/seelog_unix.xml /etc/amazon/ssm/seelog.xml

@samm-git
Copy link

@dongsupark do you think its a good thing to have? I can start working on it if you feel that it could be included to the official build.

@samm-git
Copy link

@dongsupark only q i have (probably dummy one) is how to add package in the way that it is added only to AWS AMI

@pothos
Copy link
Member Author

pothos commented May 15, 2020

Hi,
thanks for stepping in. The idea of using it via Docker is that it can be easily updated which is not the case otherwise for OEM packages because they are extracted on installation and not part of the A/B update mechanism for the /usr partition.
Currently the oem-ec2 ebuild package in coreos-overlay is responsible for the OEM partition contents of the AMI but my idea was to move this single servicefile which is just a few bytes to invoke Docker to the /usr partition so that the service file itself gets updates. In the mean time it would also be ok to have it be part of the oem-ec2 ebuild.

@samm-git
Copy link

@pothos ami update for me sounds like a better idea. With running over docker i see number of issues:

  1. Isolation which we dont really need and all related issues
  2. If docker fails to start agent will fail to start as well -> no way to fix vm
  3. Not possible to start on early boot

But yes, like an idea to make it in /usr/bin in the base

@pothos
Copy link
Member Author

pothos commented May 15, 2020

With Docker the isolation can be minimized to share as much with the host as possible.
If needed we can also make the commands available in /usr/bin as small wrappers that either enter the Docker container (to resolve the correct libs), or expose the Docker filesystem and use an LD_PRELOAD trick in the wrapper.

@samm-git
Copy link

I created initial version of the ebuild for the SSM.

@samm-git
Copy link

samm-git commented May 17, 2020

@pothos i still think that running it in docker is a bad idea, for the reasons listed above. Also it will be different from any other system which bundles it.

About your concern about updating - i think it should be solved independently, without using A/B scheme. Either by AMI upgrade (something i am using now) or by making it via native updater tool (see amazon-ssm-agent/agent/update). Seems that this tool is not documented well, but it should be trivial to understand how it works and document it.

@samm-git
Copy link

samm-git commented May 17, 2020

Found how native updater works. It is fetching manifest from the special url (e.g. https://s3.us-east-1.amazonaws.com/amazon-ssm-us-east-1/ssm-agent-manifest.json). Then it selects binary for the right system and performing update (using urls based on Manifest, e.g. https://s3.us-east-1.amazonaws.com/amazon-ssm-us-east-1/amazon-ssm-agent/2.3.1205.0/amazon-ssm-agent-linux-amd64.tar.gz). To support Flatcar - either vendor needs to provide binaries for it or we should use different manifest source. Probably when ssm-agent would land into flatcar we can try to speak with AWS about building and supporting package for it.

@pms1969
Copy link

pms1969 commented Jul 16, 2020

I think I agree with @samm-git here. We've been running the SSM agent on all our containers for a while, and not via docker. The whole point of SSM is to give you a great deal of control of the host from the AWS console/api's and to do that means you'd need to essentially mount every host directory to the docker container, which completely defeats the purpose of sandboxing it in docker in the first place.

I noticed that the PR was merged a couple of days ago. Any idea when this will be available in stable?
Thanks for your hard work here @samm-git I was thinking we might do the same thing. You've saved me a bit of work.

@vroad
Copy link

vroad commented Dec 12, 2020

According to commit details, 2605.9.0 contains those changes, but systemd unit file is not in filesystem. Do I need to build AMI by myself to enable it?

@dongsupark
Copy link
Member

dongsupark commented Dec 15, 2020

We wanted to enable it, but could not.
See flatcar-archive/coreos-overlay#510 for details.

@bjethwan
Copy link

@samm-git
Hi Alex, Can you please simplify and tell us if we can use your AWS AMI with SSM agent installed and enabled?
I need this badly and I need this fast. In fact, one of the future requirements would be to allow the SSM agent to upgrade itself in prod. I guess it would be a red flag for anyone running a big fleet of flatcar not being able to get the operational insights.
I can sure give a hand in code/build efforts.

@bjethwan
Copy link

bjethwan commented Jan 4, 2021

@pothos
Do you see any new way of enabling ssm agent in flatcar linux?
It's a red flag from the security reviewers.

@pothos
Copy link
Member Author

pothos commented Jan 4, 2021

You can run it in Docker container https://thepracticalsysadmin.com/deploy-aws-ssm-agent-to-coreos/ but I would probably add some more Docker parameters to be able to access the host environment from within the container.

@samm-git
Copy link

samm-git commented Jan 4, 2021

I am doing it from userdata, without docker. I was proposing different options to solve it in flatcar but so far no luck. I personally see this as a serious issue for the flatcar/aws.

@samm-git
Copy link

samm-git commented Jan 4, 2021

@bjethwan i can share my recipe later if you interested.

@samm-git
Copy link

samm-git commented Jan 4, 2021

@dongsupark may be we should just extend oem part. size as a workaround before oem mechanism is recreated?

@pothos
Copy link
Member Author

pothos commented Jan 4, 2021

Or we could host some resources that are too large elsewhere and fetch them through Ignition…

@samm-git
Copy link

samm-git commented Jan 4, 2021

@pothos this also could be an option, however, will require connectivity to such location, what is not always a good option. This is what i am doing now, from s3 bucket. However, still think it must be part of the base in aws. Like it is with other cloud-native distros.

@vroad
Copy link

vroad commented Jan 5, 2021

What I'm doing now is downloading SSM agent rpm file from S3 bucket stated in official AWS docs, then extract binary from the rpm.

I used busybox image for extracting agent binary because Flatcar doesn't seem to contain tools required for extracting rpm files.

@samm-git
Copy link

samm-git commented Jan 5, 2021

@pothos i still feel that as short-term solution changing partition size is a way to go.

@pothos
Copy link
Member Author

pothos commented Aug 3, 2021

With btrfs compression (just zlib for the start until GRUB gets updated) we are finally able to fit the 122 MB binaries into the OEM partition, resulting in a 37 MB used filesystem.

A quick estimate: Depending on what we put in we have space for ~300 MB more Go binaries until the 128 MB filesystem is full.

@pothos
Copy link
Member Author

pothos commented Aug 4, 2021

The SSM Agent is now included on the AWS image (released in next Alpha): flatcar-archive/coreos-overlay#1162

I think we can close this issue now, please try it out after the Alpha release and give your feedback, in case we need to reopen this :)

@pothos pothos closed this as completed Aug 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature A feature request
Projects
None yet
Development

No branches or pull requests

6 participants