Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update docs for new alerts and reporting feature #13104

Merged
merged 18 commits into from
Feb 26, 2021
Merged
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
303 changes: 302 additions & 1 deletion docs/src/pages/docs/installation/email_reports.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,309 @@ index: 10
version: 1
---

## Scheduling and Emailing Reports
## Alerts and Reports
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really comfy about putting the Alerts & Reports doc in a page named email-reports, as this doc concern alerts too, and not only emails but also Slack 🤷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely agree, was really just for organisation so that in the menu there isn't 'Alerts and reporting' and then below it 'Scheduling and Emailing reports' which would be confusing imo, but if @srinify can help we could put it into its own section 'Alerts and Reporting', and then have the legacy reporting stuff remain at the bottom of this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, renaming and keep the old approach at the end seems the best!

(version 1.0.1 and above)

Users can configure automated alerts and reports to send charts and dashboards to an email recipient or Slack channel.

- Alerts are sent when a specified condition is passed
- Reports are sent on a specified schedule

### Turning on Alerts and reports
Alerts and reports are not turned on by default. They are currently behind a feature flag, and require some additional services and configurations.

#### Requirements:

- `Dockerfile`
- webdriver to run a headless browser (for taking screenshots of the charts and dahboards)
- `docker-compose.yaml`
- redis message broker
- replacing SQLlite DB with Postgres DB
- celery worker
- celery beat
- `superset_config.py`
- feature flag turned to True
- all configs as outlined in the template below
- At least one of these is needed to send alerts and reports:
- (optional) SMTP server for sending email
- (optional) Slack app integration for sending to Slack channels

#### Summary of steps to turn on alerts and reporting:

Using the templates below,
1. Create a new directory and create the Dockerfile
2. Build the extended image using the Dockerfile
3. Create the `docker-compose.yaml` file in the same directory
4. Create a new sub directory called `config`
5. Create the `superset_config.py` file in the `config` sub directory
6. Run the image using `docker-compose up` in the same directory as the `docker-compose.py` file
7. In a new terminal window, upgrade the DB by running `docker exec -it superset-1.0.1-extended superset db upgrade`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a better way to run

superset db upgrade
superset init
superset fab create-admin

as part of startup without entering the container?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the superset-init service in the docker-compose workflow should take care of this.

8. Then run `docker exec -it superset-1.0.1-extended superset init`
9. Then setup your admin user if need be, `docker exec -it superset-1.0.1-extended superset fab create-admin`
10. Finally, restart the running instance - `CTRL-C`, then `docker-compose up`

(note: v 1.0.1 is current at time of writing, you can change the version number to the latest version if a newer version is available)
### Dockerfile

A webdriver (and headless browser) is needed to capture screenshots of the charts and dashboards which are then sent to the recipient. As the base image does not have a webdriver installed by default, we need to extend the base image and install the webdriver (this template uses the Chrome webdriver). We are also adding in connectors for Mysql and Postgres, as well as Redis and Flower (Flower and Mysql are optional depending on your requirements)

You can extend the image by running this Docker build command from the directory that contains the Dockerfile:
`docker build -t superset-1.0.1-extended -f Dockerfile .`

Config for `Dockerfile`:
```docker
FROM apache/superset:1.0.1
USER root
RUN apt update
RUN wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb && \
apt install -y --no-install-recommends ./google-chrome-stable_current_amd64.deb && \
wget https://chromedriver.storage.googleapis.com/88.0.4324.96/chromedriver_linux64.zip && \
Copy link
Member

@dpgaspar dpgaspar Feb 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two versions will eventually differ

unzip chromedriver_linux64.zip && \
chmod +x chromedriver && \
mv chromedriver /usr/bin && \
apt autoremove -yqq --purge && \
apt clean && \
rm -f google-chrome-stable_current_amd64.deb chromedriver_linux64.zip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious, would you have similar working instructions to install geckodriver? My attempts have been unsuccessful...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't get it working with geckodriver/firefox, no matter which version I tried, it wouldn't launch correctly when trying to take the screenshot... I suspect there is a missing config somewhere that is needed for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works with geckodriver too, we can adapt the doc to propose both approaches.

RUN pip install --no-cache-dir gevent
RUN pip install --no-cache-dir mysqlclient
RUN pip install --no-cache-dir psycopg2
RUN pip install --no-cache-dir redis
RUN pip install --no-cache-dir flower
USER superset

```
### Docker compose
The docker compose file lists the services that will be used when running the image. The specific services needed for alerts and reporting are outlined below.

#### Redis message broker
To ferry requests between the celery worker and the Superset instance, we use a message broker. This template uses Redis.

#### Replacing SQLite with Postgres
While it might be possible to use SQLite for alerts and reporting, it is highly recommended to use a more production ready DB for Superset in general. Our template uses Postgres.

#### Celery worker
The worker will process the tasks that need to be performed when an alert or report is fired.

#### Celery beat
The beat is the scheduler that tells the worker when to perform its tasks. This schedule is defined when you create the alert or report.

#### Full `docker-compose.yaml` configuration
The Redis, Postgres, Celery worker and Celery beat services are defined in the template:

Config for `docker-compose.yaml`:
```docker
version: '3.6'
services:
redis:
image: redis:6.0.9-buster
restart: on-failure
volumes:
- redis:/data
postgres:
image: postgres
restart: on-failure
environment:
POSTGRES_DB: superset
POSTGRES_PASSWORD: superset
POSTGRES_USER: superset
volumes:
- db:/var/lib/postgresql/data
worker:
image: superset-1.0.1-extended
restart: on-failure
healthcheck:
disable: true
depends_on:
- superset
- postgres
- redis
command: "celery worker --app=superset.tasks.celery_app:app --pool=gevent --concurrency=500"
volumes:
- ./config/:/app/pythonpath/
beat:
image: superset-1.0.1-extended
restart: on-failure
healthcheck:
disable: true
depends_on:
- superset
- postgres
- redis
command: "celery beat --app=superset.tasks.celery_app:app --pidfile /tmp/celerybeat.pid --schedule /tmp/celerybeat-schedule"
volumes:
- ./config/:/app/pythonpath/
superset:
image: superset-1.0.1-extended
restart: on-failure
environment:
- SUPERSET_PORT=8088
ports:
- "8088:8088"
depends_on:
- postgres
- redis
command: gunicorn --bind 0.0.0.0:8088 --access-logfile - --error-logfile - --workers 5 --worker-class gthread --threads 4 --timeout 200 --limit-request-line 4094 --limit-request-field_size 8190 superset.app:create_app()
volumes:
- ./config/:/app/pythonpath/
volumes:
db:
external: true
redis:
external: false
```

### Superset_config.py

The following configurations need to be added to the `superset_config.py` file. This file is loaded when the image runs, and any configurations in it will override the default configurations found in the `config.py`.

You will need to add your custom SMTP settings, and or Slack APP token

Config for `superset_config.py`:
```python
from superset_config import *
from celery.schedules import crontab
from cachelib import RedisCache
from superset.typing import CacheConfig
import os

FEATURE_FLAGS = {
"ALERT_REPORTS": True
}

# slack API token (optional)
SLACK_API_TOKEN = "xoxb-"
SLACK_PROXY = None

POSTGRES_USER = "superset"
POSTGRES_PASS = "superset"
POSTGRES_HOST = "postgres"
POSTGRES_PORT = "5432"
POSTGRES_DATABASE = "superset"
REDIS_HOST = "redis-superset"
REDIS_PORT = "6379"
# The SQLAlchemy connection string.
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://%s:%s@%s:%s/%s?client_encoding=utf8' % (POSTGRES_USER,
POSTGRES_PASS,
POSTGRES_HOST,
POSTGRES_PORT,
POSTGRES_DATABASE)
CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 24*60*60, # 1 day - if your alerts are running faster than 1 day, the cache should match
'CACHE_KEY_PREFIX': 'superset_',
'CACHE_REDIS_URL': 'redis://%s:%s/1' % (REDIS_HOST, REDIS_PORT)
}
DATA_CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 24*60*60, # 1 day - if your alerts are running faster than 1 day, the cache should match
'CACHE_KEY_PREFIX': 'data_',
'CACHE_REDIS_URL': 'redis://%s:%s/1' % (REDIS_HOST, REDIS_PORT)
}
THUMBNAIL_SELENIUM_USER = "admin"
THUMBNAIL_CACHE_CONFIG: CacheConfig = {
'CACHE_TYPE': 'redis',
'CACHE_DEFAULT_TIMEOUT': 24*60*60*30,
'CACHE_KEY_PREFIX': 'thumbnail_',
'CACHE_NO_NULL_WARNING': True,
'CACHE_REDIS_URL': 'redis://%s:%s/1' % (REDIS_HOST, REDIS_PORT)
}
SCREENSHOT_LOCATE_WAIT = 100
SCREENSHOT_LOAD_WAIT = 600
RESULTS_BACKEND = RedisCache(host=REDIS_HOST, port=REDIS_PORT, key_prefix='superset_results')
class CeleryConfig(object):
BROKER_URL = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
CELERY_IMPORTS = ('superset.sql_lab', "superset.tasks", "superset.tasks.thumbnails", )
CELERY_RESULT_BACKEND = 'redis://%s:%s/0' % (REDIS_HOST, REDIS_PORT)
CELERYD_PREFETCH_MULTIPLIER = 10
CELERY_ACKS_LATE = True
CELERY_ANNOTATIONS = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
'email_reports.send': {
'rate_limit': '1/s',
'time_limit': 600,
'soft_time_limit': 600,
'ignore_result': True,
},
}
CELERYBEAT_SCHEDULE = {
'reports.scheduler': {
'task': 'reports.scheduler',
'schedule': crontab(minute='*', hour='*'),
},
'reports.prune_log': {
'task': 'reports.prune_log',
'schedule': crontab(minute=0, hour=0),
},
'cache-warmup-hourly': {
'task': 'cache-warmup',
'schedule': crontab(minute='*/30', hour='*'),
'kwargs': {
'strategy_name': 'top_n_dashboards',
'top_n': 10,
'since': '7 days ago',
},
},
}
CELERY_CONFIG = CeleryConfig

# SMTP email configuration
EMAIL_REPORTS_USER="admin" # change to a user that has access to the chart / dashboard that you are sending
EMAIL_PAGE_RENDER_WAIT=300
EMAIL_NOTIFICATIONS = True

SMTP_HOST = "smtp.sendgrid.net" #change to your host
SMTP_STARTTLS = True
SMTP_SSL = False
SMTP_USER = "your_user"
SMTP_PORT = 2525 # your port eg. 587
SMTP_PASSWORD = "your_password"
SMTP_MAIL_FROM = "noreply@youremail.com"

WEBDRIVER_TYPE= "chrome"
WEBDRIVER_OPTION_ARGS = [
"--force-device-scale-factor=2.0",
"--high-dpi-support=2.0",
"--headless",
"--disable-gpu",
"--disable-dev-shm-usage",
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-extensions",
]

WEBDRIVER_BASEURL="http://superset:8088"
WEBDRIVER_BASEURL_USER_FRIENDLY="http://localhost:8088" # change to your domain eg. https://superset.mydomain.com - this is the link that is sent to the recipient
SUPERSET_WEBSERVER_ADDRESS = "localhost"
SUPERSET_WEBSERVER_PORT = 8088
SUPERSET_WEBSERVER_TIMEOUT=600

```

### Summary
With the extended image created by using the `Dockerfile`, and then running that image using `docker-compose.yaml`, plus the required configurations in the `superset_config.py` you should now have alerts and reporting working correctly.

- For Kubernetes you can see the Helm chart here
- The above templates also work in a Docker swarm environment, you would just need to add `Deploy:` to the Superset, Redis and Postgres services along with your specific configs for your swarm

### Optional - Slack integration
To send alerts and reports to a Slack channel, you need to create a new Slack APP on your domain.
1. Head to https://api.slack.com/apps
2. Create a new APP, give it a name (eg. Superset)
3. Under the OAuth and Permissions section, give the following scopes to the app:
1. `incoming-webhook`
2. `calls:write`
4. At the top of the OAuth and Permissions section, click 'install to workspace'
5. Select a default channel for the app to post to and continue. (You can post to any channel by inviting your Superset app into that channel)
6. The app should now be installed on the workspace, and a 'Bot User OAuth Access Token' should be created. Copy the OAuth token and add it into the Slack section in the `superset_config.py`
7. Restart the service (or run `superset init`) to pull in the new configuration.
8. Note when sending to the channel from the alerts and reports UI, set the channel without the leading '#' eg. use `alerts` instead of `#alerts`


#
## Scheduling and Emailing Reports
(version 0.38 and below)
### Email Reports

Email reports allow users to schedule email reports for:
Expand Down