Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REF292] containerization of live_data_server #1

Merged
merged 6 commits into from
Mar 7, 2022

Conversation

KedoKudo
Copy link
Member

@KedoKudo KedoKudo commented Mar 2, 2022

  • Original gitlab story: REF292

This PR adds the Dockerfile and an example docker-compose file for the containerization of the live-data-server.
Corresponding documentation on how to build and run the application as a container is added to the readme.md as well.

@KedoKudo KedoKudo requested a review from rosswhitfield March 2, 2022 22:21
Copy link
Member

@jmborr jmborr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I searched for string systemctl anywhere in the package and found no hits. Do you know how script third-party/systemctl.py is consumed by the app?


# ENV variables that need to be updated/supplied either from CLI or
# docker-compose.yml
ENV DATABASE_NAME=livedata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These environment variables should not be hardcoded in the image but rather initialized at run-time when the container starts. Is there a reason why you put them in the Dockerfile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

short version:
Defining the required variable inside the dockerfile is the standard practice.

long version:
The ENV in a docker file is intended to define the env vars that can be later overwritten via docker-compose or CLI.
Therefore the actual values here does not matter as they will definitely be different when live_data_server image is integrated into other system.
If these values are not defined here, the image won't be able to recognize these are env variables, and there is a good chance that the runtime container will have some strange issue with these vars.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining the required variable inside the dockerfile is the standard practice.

The standard practice for sensitive information like the database admin account is to avoid encoding it the source, like we're doing in web_reflectivity. You can place those variables in an .envrc file automatically read by command direnv, or you can manually export them in the terminal where you build your images.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not sensitive information since they are already visible from the source code.
During the deployment, the actual Authentication information is stored locally, so there is no reason to make things extra cryptic.

@peterfpeterson please advise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being a bit reductionist: having a password in the source today doesn't mean that it should stay there forever.

If we were going to redeploy this for something other than the test environment, I think we'd need the ability to change where the container looks for the database and the login at runtime so there is only a configuration change (not a rebuild) to go from remote testing environment to production environment. I really don't have a strong preference, but if we leave them hard-coded with the test information now and we work on this repo "later" we'll need to remove them then.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github doesn't let you reveal secrets once they're set. :(

For the sake of consistency with other projects, let's not put the secrets in the repo and have them as CI/dev provided variables.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not secrets (even if we remove them from dockerfile, they are still visible from the settings.py, which means we can't hide them).
Also, defining them in dockerfile basically tell future devs that these are the environment vars you have to specify when spinning up a container based on the image.
Currently, live_data_server does not need to talk to the data base during the build, and the db action only happens

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github doesn't let you reveal secrets once they're set. :(

bummer, I thought the behavior was the same as GitLab's regarding environment variables.
I tried adding a DATABASE_USER and you're right, once set there's no way to see its value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not secrets (even if we remove them from dockerfile, they are still visible from the settings.py, which means we can't hide them).

I'm looking at settings.py and these variables are being referenced but I don't see their actual values hardcoded in the source. What do you mean they're visible?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values stored in the current dockerfiles are dummy ones, which will work locally.
You were suggesting removing the definition completely, which is what I am against as it is pointless:

  • defining them in the dockerfile makes it easier for future devs to know that these are the necessary vars that need to be provided when pinning up a container
  • the values of these env vars in the dockerfile are not important, but it allows one to quickly spin things up for checking.
  • an important point I think is missing at the moment: when spinning up the image as a new container, docker will update/overwrite these var with whatever is provided.

- db-data:/var/lib/postgresql/data/

livedata:
image: live_data:dev
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I don't understand. Can you clarify the meaning of this line for me? I though you'd be building the image with the Dockerfile instead of what it looks like importing an already created image.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docker-file is intended as a way to demonstrate how to integrate the live_data_server into an existing system (i.e. it will need some env var defined as well as a predefined db that has the necessary table entries).
Most dev's would build the image locally, and the tag :dev is a common tag used to describe the local developed image.
Adding the building info here doesn't really help as we will not use docker-compose to build the image.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't plan on building the image with docker-compose then you'll need to pass the secret database account at docker build time. I suggest to store the docker-build command as a Makefile target because it will probably be a lengthy command.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to pass the db info during the building time.
A simple docker build -t TAGNAME . is sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is as you say provided you leave the database admin account secrets in the Dockerfile. If you remove then, then you need to pass then as arguments to docker build.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a little bit lost at the moment, so here is what I recommend:

  • leave everything as it is (this is just for documentation purpose according to the original enabler).
    • the actual artifact needed is the image, which is already in Gitlab
  • build a local image of the live_data_server with: docker build -t live_data:dev .
  • spin up the virtual network with docker-compose so that we can verify that the live_data server can talk to the external database via docker-compose up
  • if needed, go to localhost:9999/admin, and you should get 400 error as the livedata server does not have a valid admin page enabled
    • this is the current state of this app, and we are not refactoring/modernizing this web app according to the original Gitlab enabler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with everything except that initialization of variables DATABASE_USER, DATABASE_PASS, DATABASE_PORT, and DATABASE_HOST should be removed from the .env and DockerFile files.
@peterfpeterson , sorry but it's up to you to break the tie 😄

@KedoKudo
Copy link
Member Author

KedoKudo commented Mar 3, 2022

The production deployment of live_data_server is via apachectl, which is a background process.
Docker does not want background process running inside a container, therefore we need to used a fake/alt systemctl to bypass this constrain.
Currently we are using the simple runtime server provided by django since the image is (currently) intended for development only.
However, it is possible that someone will try to do a 100% mimic of the actual production server, which will require using systemctl to launch the server in background, which is why this systemctl is added to bypass the constrain from docker.

@peterfpeterson peterfpeterson merged commit 94365ff into main Mar 7, 2022
@peterfpeterson peterfpeterson deleted the REF292_containerization branch March 7, 2022 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants