Skip to content
This repository has been archived by the owner on May 8, 2023. It is now read-only.

Latest commit

 

History

History
131 lines (82 loc) · 6.06 KB

File metadata and controls

131 lines (82 loc) · 6.06 KB

Databricks

Code style: black

Table of content

Getting started with Databricks development in Time Series

Install necessary tools needed for development

    • use WSL 2, you will get a prompt with a guide after installing docker
  • Visual Studio Code (system installer)

    • Extension called Remote - Containers

Get workspace ready for development

  • Open geh-timeseries folder in Visual Studio Code

  • Select Remote Explorer in the left toolbar

  • Click on the plus icon in the top of the panel to the right of Containers and select Open Current Folder in Container

  • Wait for the container to build (This will take a few minutes the first time) and then you are ready to go

Running Tests

  • To run all test you will need to execute the following command in the workspace terminal

    pytest
    
  • For more verbose output use

    pytest -vv -s
    
  • To run tests in a specific folder simply navigate to the folder in the terminal and use the same command as above

  • To run tests in a specific file navigate to the folder where the file is located in the terminal and execute the following command

    pytest file-name.py
    
  • You can also run a specific test in the file by executing the following command

    pytest file-name.py::function-name
    

Debugging Tests

Use the Python Test Explorer for Visual Studio Code extension in VSCode. It is automatically installed in the container (see .devcontainer/devcontainer.json).

Alternative Debug Approach

This is a less simple and intuitive way of debugging, but may serve as an alternative in case of problems with the recommended way of debugging.

  • To debug tests you need to execute the following command

    Using debugz.sh with the following command

    sh debugz.sh
    

    Or using command inside debugz.sh

    python -m ptvsd --host 0.0.0.0 --port 3000 --wait -m pytest -v
    
  • Create a launch.json file in the Run and Debug panel and add the following

    {
        "name": "Python: Attach container",
        "type": "python",
        "request": "attach",
        "port": 3000,
        "host": "localhost"
    }
  • Start debugging on the Python: Attach container in the Run and Debug panel

Styling and Formatting

We try to follow PEP8 as much as possible, we do this by using Flake8 and Black The following Flake8 codes are ignored:

  • Module imported but unused (F401)
  • Module level import not at top of file (E402)
  • Whitespace before ':' (E203) (Needed for black you work well with Flake8, see documentation here)
  • Line too long (82 > 79 characters) (E501) (Only ignored in CI step)
  • Line break occurred before a binary operator (W503) (Black formatting does not follow this rule)

Links to files containing Flake8 ignore tox.ini and ci.yml

We are using standard Black code style.

Test Python code in CI pipeline

Building and publishing a Docker image for testing

In the CI pipeline, the tests are executed towards a Docker image, which is described in the a Dockerfile.

A new Docker image is build and published using the Docker CD-pipeline, meaning that a new Docker image is only published, when changes are made to the files described in the paths-sections of the workflow.

If a pull request triggers a new Docker image to be published, a new version of the Docker image is published on each commit. The Docker images published when a pull request is open, are considered pre-releases. A pre-release-image is assigned a tag with the following format: pre-release-pr<PR-number>, e.g. pre-release-pr311. When the pull request has been merged, the Docker CD-pipeline is run again, and a new latest version is published.

Running the tests using a published Docker image

The default Docker image used for testing is the newest version of the "latest"-tagged databricks-unit-test-image stored in GitHub packages, which is a container registry.

In a pull request, it is possible to change the version of the Docker image used for running the tests. For example, if a pull request changes the Dockerfile, it might be relevant to run the test base towards the new Docker image. To change the version of the Docker image used, change the image-reference in the docker-compose.yml-file to e.g. ghcr.io/energinet-datahub/geh-timeseries/databricks-unit-test:pre-release-pr311.