Skip to content

Setup guide with scripts and configuration files for the CEOS Datacube

License

Notifications You must be signed in to change notification settings

ikselven/datacube-setup-scripts

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datacube Installation Scripts [WIP]

Scripts to aid in setup of the Datacube software and its web interface Datacube UI.

Please read the whole README at least once prior to using any of the scripts.

Foreword

This is a work in progress and is still incomplete. All scripts are written in batch style, automatically aborting on error thanks to bash’s set -e. There is manual intervention required, e.g. to enter the password for sudo, enter some information this installer needs or to deal with the configuration dialogs of the Debian package manager. At the moment, the scripts are not guaranteed to be idempotent, so running the scripts twice on the same system might fail.

The scripts are developed on a Debian Stretch system, and have been additionally tested on an Ubuntu 16.04 using upstart instead of systemd. Requirements such as PostgreSQL, Postfix and Apache are installed via apt while Python packages are installed via conda, falling back to pip if necessary.

On Debian releases other than Stretch or Ubuntu releases other than 16.04, the scripts may work. Be aware though, that as package names, package versions or paths might be different, manual intervention might be necessary to make the scripts run cleanly.

Warning

Originally, the scripts have been developed with an empty environment in mind, i.e. no PostgreSQL, Apache Webserver or Postfix pre-installed. The setup of PostgreSQL and Postfix have already been adjusted to give the user the option to configure it themselves. The Apache setup does not offer this option yet.


Preparation

  • setup a (virtual) machine with Debian Stretch
  • create a regular user with sudo permission
  • install Anaconda to a directory of your desire, e.g. your home directory
  • create a directory called datacube, e.g. in your home directory, where the Datacube’s data and the Datacube UI will reside
  • edit scripts/config and adjust the 3 settings to suit your situation
  • download satellite data that should be added to the datacube

Setup of the Datacube

Change into the previously created datacube directory and clone this repository. Run scripts/01-install-datacube.sh to install the Datacube software, PostgreSQL and initialize the database for the Datacube. If you want to configure PostgreSQL manually, this is what the script will change in the default configuration:

File: /etc/postgresql/*/main/pg_hba.conf
------------------------------------------------------------------------------
<file content omitted>
host    all             ${DB_USER}       samenet                 trust

This line is added at the end, ${DB_USER} refers to the username the scripts asks for.

File: /etc/postgresql/*/main/postgresql.conf
------------------------------------------------------------------------------
max_connections = 1000
unix_socket_directories = '/var/run/postgresql,/tmp'
shared_buffers = 4096MB
work_mem = 64MB
maintenance_work_mem = 256MB
timezone = 'UTC'

For more info, see https://github.com/ceos-seo/data_cube_ui/blob/master/docs/datacube_install.md#postgresql-configuration.

Run scripts/02-configure-data-import-environment.sh to install python packages required for data import and create the required directory structure.

Setup of the Datacube UI

Run scripts/03-install-datacube-ui.sh. This will install the Datacube UI itself and required software such as: Apache 2 Webserver, Redis, ImageMagick, Django and Postfix.

There are two patches included for the Datacube UI: one for making the directory, where the Datacube UI stores created product files, configurable (required for this setup to work) and another one that bundles changes between Django 1.11 and 2.

During installation of Postfix, a dialog will be shown. Choose “Internet site”.

If you decide to setup/configure Postfix yourself, here is what the script configures:

File: /etc/postfix/main.cf
-----------------------------------------------------------------------------
inet_interfaces = localhost

For more info, see the end of https://github.com/ceos-seo/data_cube_ui/blob/master/docs/ui_install.md#installation_process

After that, follow the instructions at the end of the script:

  • complete the configuration and setup of Django
  • apply workarounds of the ‘Known Issues’ section, if necessary

Finally, to setup Celery as a daemon, run scripts/04-configure-celery.sh.

Import/Ingest Data

Unpack the downloaded scenes into ~/datacube/data/original_data.

Before the ingestion, 3 files need to be created:

  • a product type definition (YAML),
  • a prepare script (Python) and
  • an ingestion configuration (YAML).

Examples for these files can be found in this repository in metadata, in the AGDC repository, in this guide for the Swiss Datacube and in the Datacube repository. For documentation about those files, see the Datacube documentation.

Adjust all file paths in scripts/05-ingest-data.sh to correctly refer to the original data and the 3 files from above. Run the script.

Customize the Datacube UI

After ingestion, a new area needs to be configured and associated with the tools available. Please read this section of the UI’s documentation, where the necessary steps are described.


Known Issues

ImportError: /lib/x86_64-linux-gnu/libz.so.1: version `ZLIB_1.2.9' not found in Apache error log

In the current setup with a conda environment, it may occur, that python code run by Apache via mod_wsgi has trouble to see some of the libraries installed in the conda environment. There 2 workarounds for this, both are hacky:

  1. Download Zlib 1.2.9 and run ./configure prefix=/usr/local/; make; sudo make install
  2. Extend $PATH and $LD_LIBRARY_PATH in /etc/apache2/envvars to include the bin and the lib directory of the conda environment for the datacube.

Error “populate() isn’t reentrant” in Apache error log

Run “manage.py check” inside the data_cube_ui directory and fix the issues reported there. This message might appear e.g. when using the Datacube UI in combination with Django 2 without having migrated to Django 2.

Ingested datasets are not shown on the page “Datacube Visualization”

As per this comment in the repository of the Datacube UI, open a terminal and type the following:

conda activate cubeenv
cd /path/to/data_cube_ui
python manage.py shell

In the python shell, type:

import apps.data_cube_manager.tasks as dcmt
dcmt.update_data_cube_details()

Ingestion does not work in the Datacube UI

The Datacube UI has a lot of hardcoded usernames and pathnames, so some things might not work as expected. E.g. ingestion of data via the interface of the Datacube uses a hardcoded username and hardcoded database names. If you haven’t setup that user, ingestion via UI will not work. However, you can ingest data via commandline, using scripts/05-ingest-data.sh.


License

The scripts in this repository are free software distributed under the terms of the GNU General Public License 3.0 or any later version.

Funding

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 642088. It is related to the project Satellite-based Wetland Observation Service (SWOS) and related work of the Friedrich Schiller University Jena - Department for Earth Observation.

About

Setup guide with scripts and configuration files for the CEOS Datacube

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages