Improve/optimize current Dockerfile and split containers #900

jeffwidman · 2016-03-08T01:25:11Z

Is there any reason the redash docker container runs both the app and celery?

Seems like it'd be better to break celery out into it's own container, especially since there's an official Docker Celery image: https://hub.docker.com/_/celery/

arikfr · 2016-03-08T06:56:16Z

The reason it's done the way it's done right now is because it was the shortest path to success. It should be changed, but not to separate images but rather separate containers. The workers and web server can still share the same image, just have a different command when executing.

Also, the Dockerfile should be reorganized to improve caching. For the hosted version I'm using the following Dockerfile:

FROM ubuntu:trusty

EXPOSE 5000

RUN useradd --system --comment " " --create-home redash

# Ubuntu packages
RUN apt-get update && \
    apt-get install -y python-pip python-dev curl build-essential libffi-dev sudo wget \
    # Postgres client
    libpq-dev \
    # Additional packages required for data sources:
    libssl-dev libmysqlclient-dev freetds-dev && \
    # Cleanup
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

RUN pip install -U setuptools

# Set the WORKDIR to /app so all following commands run in /app
WORKDIR /app

COPY requirements.txt requirements_dev.txt requirements_all_ds.txt ./
RUN pip install -r requirements.txt -r requirements_dev.txt -r requirements_all_ds.txt

# Adding the whole repository to the container
COPY . ./
RUN chown -R redash /app

ENTRYPOINT ["/app/bin/start"]

For example, COPYing the requirement files before copying the rest, ensures that as long as we don't change dependencies we can reuse the cached layer.

It's also simpler than the existing one because we don't install the frontend (Node.js) dependencies. But this probably should be kept for people who want to build locally.

The start script is something like the following:

#!/bin/bash
set -e

get_config() {
  ENV_NAME=${ENV_NAME:-production}

  if [ "$ENV_NAME" = "production" ]
  then
    # redacted...
  fi
}

worker() {
  WORKERS_COUNT=${WORKERS_COUNT:-2}
  QUEUES=${QUEUES:-queries,scheduled_queries,celery}

  echo "Starting $WORKERS_COUNT workers for queues: $QUEUES..."
  exec sudo -E -u redash /usr/local/bin/celery worker --app=redash.worker -c$WORKERS_COUNT -Q$QUEUES -linfo --maxtasksperchild=10 -Ofair
}

scheduler() {
  WORKERS_COUNT=${WORKERS_COUNT:-1}
  QUEUES=${QUEUES:-celery}

  echo "Starting scheduler and $WORKERS_COUNT workers for queues: $QUEUES..."

  exec sudo -E -u redash /usr/local/bin/celery worker --app=redash.worker --beat -c$WORKERS_COUNT -Q$QUEUES -linfo --maxtasksperchild=10 -Ofair
}

api() {
  exec sudo -E -u redash /usr/local/bin/gunicorn -b 0.0.0.0:5000 -k gevent --name redash -w4 redash.wsgi:app
}


help() {
  echo "Usage: "
  echo "`basename "$0"` {worker, scheduler, api}"
}

case "$@" in
  worker)
    get_config
    shift
    worker
    ;;
  api)
    get_config
    shift
    api
    ;;
  scheduler)
    get_config
    shift
    scheduler
    ;;
  *)
    help
    ;;
esac

And then usage is something like: docker run redash/redash worker or docker run redash/redash api.

arikfr changed the title ~~Break out Celery into it's own Docker container~~ Improve/optimize current Dockerfile and split containers Mar 8, 2016

arikfr added the Tech Debt label Jun 14, 2016

arikfr mentioned this issue Jan 16, 2017

Docker based developer workflow #1530

Merged

2 tasks

arikfr closed this as completed in #1530 Jan 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve/optimize current Dockerfile and split containers #900

Improve/optimize current Dockerfile and split containers #900

jeffwidman commented Mar 8, 2016

arikfr commented Mar 8, 2016

Improve/optimize current Dockerfile and split containers #900

Improve/optimize current Dockerfile and split containers #900

Comments

jeffwidman commented Mar 8, 2016

arikfr commented Mar 8, 2016