Scripts to aid in setup of the Datacube software and its web interface Datacube UI.
Please read the whole README at least once prior to using any of the scripts.
This is a work in progress and is still incomplete. All scripts are written in
batch style, automatically aborting on error thanks to bash’s set -e
. There
is manual intervention required, e.g. to enter the password for sudo,
enter some information this installer needs or to deal with the configuration
dialogs of the Debian package manager. At the moment, the scripts are not
guaranteed to be idempotent, so running the scripts twice on the same system
might fail.
The scripts are developed on a Debian Stretch system, and have been additionally
tested on an Ubuntu 16.04 using upstart instead of systemd. Requirements such as
PostgreSQL, Postfix and Apache are installed via apt
while Python packages
are installed via conda
, falling back to pip
if necessary.
On Debian releases other than Stretch or Ubuntu releases other than 16.04, the scripts may work. Be aware though, that as package names, package versions or paths might be different, manual intervention might be necessary to make the scripts run cleanly.
Originally, the scripts have been developed with an empty environment in mind, i.e. no PostgreSQL, Apache Webserver or Postfix pre-installed. The setup of PostgreSQL and Postfix have already been adjusted to give the user the option to configure it themselves. The Apache setup does not offer this option yet.
- setup a (virtual) machine with Debian Stretch
- create a regular user with sudo permission
- install Anaconda to a directory of your desire, e.g. your home directory
- create a directory called
datacube
, e.g. in your home directory, where the Datacube’s data and the Datacube UI will reside - edit
scripts/config
and adjust the 3 settings to suit your situation - download satellite data that should be added to the datacube
Change into the previously created datacube
directory and clone this
repository. Run scripts/01-install-datacube.sh
to install the Datacube
software, PostgreSQL and initialize the database for the Datacube. If you
want to configure PostgreSQL manually, this is what the script will change in
the default configuration:
File: /etc/postgresql/*/main/pg_hba.conf ------------------------------------------------------------------------------ <file content omitted> host all ${DB_USER} samenet trust
This line is added at the end, ${DB_USER}
refers to the username the
scripts asks for.
File: /etc/postgresql/*/main/postgresql.conf ------------------------------------------------------------------------------ max_connections = 1000 unix_socket_directories = '/var/run/postgresql,/tmp' shared_buffers = 4096MB work_mem = 64MB maintenance_work_mem = 256MB timezone = 'UTC'
For more info, see https://github.com/ceos-seo/data_cube_ui/blob/master/docs/datacube_install.md#postgresql-configuration.
Run scripts/02-configure-data-import-environment.sh
to
install python packages required for data import and create the required
directory structure.
Run scripts/03-install-datacube-ui.sh
. This will install the
Datacube UI itself and required software such as: Apache 2 Webserver, Redis,
ImageMagick, Django and Postfix.
There are two patches included for the Datacube UI: one for making the directory, where the Datacube UI stores created product files, configurable (required for this setup to work) and another one that bundles changes between Django 1.11 and 2.
During installation of Postfix, a dialog will be shown. Choose “Internet site”.
If you decide to setup/configure Postfix yourself, here is what the script configures:
File: /etc/postfix/main.cf ----------------------------------------------------------------------------- inet_interfaces = localhost
For more info, see the end of https://github.com/ceos-seo/data_cube_ui/blob/master/docs/ui_install.md#installation_process
After that, follow the instructions at the end of the script:
- complete the configuration and setup of Django
- apply workarounds of the ‘Known Issues’ section, if necessary
Finally, to setup Celery as a daemon, run scripts/04-configure-celery.sh
.
Unpack the downloaded scenes into ~/datacube/data/original_data
.
Before the ingestion, 3 files need to be created:
- a product type definition (YAML),
- a prepare script (Python) and
- an ingestion configuration (YAML).
Examples for these files can be found in this repository in metadata
, in the
AGDC repository,
in this guide for the Swiss Datacube and in the
Datacube repository.
For documentation about those files, see the Datacube documentation.
Adjust all file paths in scripts/05-ingest-data.sh
to correctly refer to the
original data and the 3 files from above. Run the script.
After ingestion, a new area needs to be configured and associated with the tools available. Please read this section of the UI’s documentation, where the necessary steps are described.
In the current setup with a conda environment, it may occur, that python code run by Apache via mod_wsgi has trouble to see some of the libraries installed in the conda environment. There 2 workarounds for this, both are hacky:
- Download Zlib 1.2.9 and run
./configure prefix=/usr/local/; make; sudo make install
- Extend
$PATH
and$LD_LIBRARY_PATH
in/etc/apache2/envvars
to include thebin
and thelib
directory of the conda environment for the datacube.
Run “manage.py check” inside the data_cube_ui
directory and fix the issues
reported there. This message might appear e.g. when using the Datacube UI in
combination with Django 2 without having migrated to Django 2.
As per this comment in the repository of the Datacube UI, open a terminal and type the following:
conda activate cubeenv
cd /path/to/data_cube_ui
python manage.py shell
In the python shell, type:
import apps.data_cube_manager.tasks as dcmt
dcmt.update_data_cube_details()
The Datacube UI has a lot of hardcoded usernames and pathnames, so some things
might not work as expected. E.g. ingestion of data via the interface of the
Datacube uses a hardcoded username and hardcoded database names. If you
haven’t setup that user, ingestion via UI will not work. However, you can
ingest data via commandline, using scripts/05-ingest-data.sh
.
The scripts in this repository are free software distributed under the terms of the GNU General Public License 3.0 or any later version.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 642088. It is related to the project Satellite-based Wetland Observation Service (SWOS) and related work of the Friedrich Schiller University Jena - Department for Earth Observation.