Skip to content

Backend — Setup Your DWD Harvester

Jonas Jaszkowic edited this page Jun 4, 2024 · 9 revisions

The DWD harvester is Python script packed as a docker image that can be run as a GitHub action on a schedule. It will:

  • Pull radolan data from the DWDs server
  • Stencil this data to only get the rain data from Berlin
  • Generate a geojson file with the rain grid
  • Generate a csv with the trees and the rain sum of the last 30 days
  • Generate vector tiles on Mapbox for mobile usage
  • Push the generated data to Supabase storage

Refer to the following diagram to understand the relationship between Frontend, Backend and the DWD Harvester. We are aware that this is overly complex and should be simplified. It grew historically into this.

gdk_dwd_final

You will have to update the source code with a shape file of your city to collect rain data for cities other then Berlin. See the detailed documentation in the DWD harvester repository.

You will also have to create an environment and some encrypted repo secrets.

We are running this from the giessdenkiez-de repository where the frontend is located. You also need a secret token from your Mapbox account to upload data. Your token should have the following permissions.

  • DATASETS:WRITE
  • UPLOADS:READ
  • TILESETS:LIST
  • UPLOADS:LIST
  • TILESETS:READ
  • UPLOAD:WRITE
  • TILESETS:WRITE

In one of your repos create the file .github/workflows/dwd-harvester.yml. (See our current action here.)

Add the following code to that file.

See https://github.com/technologiestiftung/giessdenkiez-de/blob/master/.github/workflows/rain.yml for reference.

name: DWD Radolan Harvester Unified

on:
  workflow_dispatch:
    inputs:
      environment:
        description: "Environment (e.g. development or production)"
        required: true
        type: string
  repository_dispatch:
    # This action can be triggered via Github API webook (see https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#repository_dispatch)
    types: [radolan_cron]

jobs:
  rain:
    # Using the payload of the repository_dispatch webhook to specify the environment
    environment: "${{ github.event.inputs.environment || github.event.client_payload.environment }}"
    runs-on: ubuntu-latest
    name: Aggregate rain data from DWD radolan
    steps:
      - name: Harvester
        uses: docker://technologiestiftung/giessdenkiez-de-dwd-harvester:v2.5.0
        id: harvester
        env:
          PG_SERVER: ${{ secrets.PG_SERVER }}
          PG_PORT: ${{ secrets.PG_PORT }}
          PG_USER: ${{ secrets.PG_USER }}
          PG_PASS: ${{ secrets.PG_PASS }}
          PG_DB: ${{ secrets.PG_DB }}
          SUPABASE_URL: ${{ vars.SUPABASE_URL }}
          SUPABASE_BUCKET_NAME: ${{ vars.SUPABASE_DATA_ASSETS_BUCKET }}
          SUPABASE_SERVICE_ROLE_KEY: ${{ secrets.SUPABASE_SERVICE_ROLE_KEY }}
          OUTPUT: "True"
          LOGGING: "INFO"
          MAPBOXUSERNAME: ${{ secrets.MAPBOXUSERNAME }}
          MAPBOXTOKEN: ${{ secrets.MAPBOXTOKEN }}
          MAPBOXTILESET: ${{ secrets.MAPBOXTILESET }}
          MAPBOXLAYERNAME: ${{ secrets.MAPBOXLAYERNAME }}
          SKIP_MAPBOX: "False"
          LIMIT_DAYS: ${{ vars.LIMIT_DAYS }}
          SURROUNDING_SHAPE_FILE: ${{ vars.SURROUNDING_SHAPE_FILE }}

To test if this is working as desired you should run the action using the workflow_dispatch trigger from the GitHub user interface.