V7 Darwin Python SDK

⚡️ Official library to annotate, manage datasets, and models on V7's Darwin Training Data Platform. ⚡️

Darwin-py can both be used from the command line and as a python library.

Main functions are (but not limited to):

Client authentication
Listing local and remote datasets
Create/remove datasets
Upload/download data to/from remote datasets
Direct integration with PyTorch dataloaders
Extracting video artifacts

Support tested for python 3.9 - 3.12

🏁 Installation

pip install darwin-py

You can now type darwin in your terminal and access the command line interface.

If you wish to use the PyTorch bindings, then you can use the ml flag to install all the additional requirements

pip install darwin-py[ml]

If you wish to use video frame extraction, then you can use the ocv flag to install all the additional requirements

pip install darwin-py[ocv]

If you wish to use video artifacts extraction, then you need to install FFmpeg

To run test, first install the test extra package

pip install darwin-py[test]

Configuration

Retry Configuration

The SDK includes a retry mechanism for handling API rate limits (429) and server errors (500, 502, 503, 504). You can configure the retry behavior using the following environment variables:

DARWIN_RETRY_INITIAL_WAIT: Initial wait time in seconds between retries (default: 60)
DARWIN_RETRY_MAX_WAIT: Maximum wait time in seconds between retries (default: 300)
DARWIN_RETRY_MAX_ATTEMPTS: Maximum number of retry attempts (default: 10)

Example configuration:

# Configure shorter retry intervals and fewer attempts
export DARWIN_RETRY_INITIAL_WAIT=30
export DARWIN_RETRY_MAX_WAIT=120
export DARWIN_RETRY_MAX_ATTEMPTS=5

The retry mechanism will automatically handle:

Rate limiting (HTTP 429)
Server errors (HTTP 500, 502, 503, 504)

For each retry attempt, you'll see a message indicating the type of error and the wait time before the next attempt.

Development

See our development and QA environment installation recommendations here

Usage as a Command Line Interface (CLI)

Here you can find V7 labs doc on the CLI usage

Once installed, darwin is accessible as a command line tool. A useful way to navigate the CLI usage is through the help command -h/--help which will provide additional information for each command available.

Client Authentication

To perform remote operations on Darwin you first need to authenticate. This requires a team-specific API-key. If you do not already have a Darwin account, you can contact us and we can set one up for you.

To start the authentication process:

$ darwin authenticate
API key:
Make example-team the default team? [y/N] y
Datasets directory [~/.darwin/datasets]:
Authentication succeeded.

You will be then prompted to enter your API-key, whether you want to set the corresponding team as default and finally the desired location on the local file system for the datasets of that team. This process will create a configuration file at ~/.darwin/config.yaml. This file will be updated with future authentications for different teams.

Listing local and remote datasets

Lists a summary of local existing datasets

$ darwin dataset local
NAME            IMAGES     SYNC_DATE         SIZE
mydataset       112025     yesterday     159.2 GB

Lists a summary of remote datasets accessible by the current user.

$ darwin dataset remote
NAME                       IMAGES     PROGRESS
example-team/mydataset     112025        73.0%

Create/remove a dataset

To create an empty dataset remotely:

$ darwin dataset create test
Dataset 'test' (example-team/test) has been created.
Access at https://darwin.v7labs.com/datasets/579

The dataset will be created in the team you're authenticated for.

To delete the project on the server:

$ darwin dataset remove test
About to delete example-team/test on darwin.
Do you want to continue? [y/N] y

Upload/download data to/from a remote dataset

Uploads data to an existing remote project. It takes the dataset name and a single image (or directory) with images/videos to upload as parameters.

The -e/--exclude argument allows to indicate file extension/s to be ignored from the data_dir. e.g.: -e .jpg

For videos, the frame rate extraction rate can be specified by adding --fps <frame_rate>

Supported extensions:

Video files: [.mp4, .bpm, .mov formats].
Image files [.jpg, .jpeg, .png formats].

$ darwin dataset push test /path/to/folder/with/images
100%|████████████████████████| 2/2 [00:01<00:00,  1.27it/s]

Before a dataset can be downloaded, a release needs to be generated:

$ darwin dataset export test 0.1
Dataset test successfully exported to example-team/test:0.1

This version is immutable, if new images / annotations have been added you will have to create a new release to included them.

To list all available releases

$ darwin dataset releases test
NAME                           IMAGES     CLASSES                   EXPORT_DATE
example-team/test:0.1               4           0     2019-12-07 11:37:35+00:00

And to finally download a release.

$ darwin dataset pull test:0.1
Dataset example-team/test:0.1 downloaded at /directory/choosen/at/authentication/time .

Usage as a Python library

Here you can find V7 labs doc on the usage as Python library

The framework is designed to be usable as a standalone python library. Usage can be inferred from looking at the operations performed in darwin/cli_functions.py. A minimal example to download a dataset is provided below and a more extensive one can be found in

./darwin_demo.py.

from darwin.client import Client

client = Client.local() # use the configuration in ~/.darwin/config.yaml
dataset = client.get_remote_dataset("example-team/test")
dataset.pull() # downloads annotations and images for the latest exported version

Follow this guide for how to integrate darwin datasets directly in PyTorch.

Name		Name	Last commit message	Last commit date
Latest commit History 1,069 Commits
.devcontainer		.devcontainer
.github		.github
.hooks		.hooks
.vscode		.vscode
darwin		darwin
deploy		deploy
docs		docs
e2e_tests		e2e_tests
source		source
tests		tests
video_annotations_with_subtypes		video_annotations_with_subtypes
.editorconfig		.editorconfig
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
darwin_demo.py		darwin_demo.py
mkdocs.yml		mkdocs.yml
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
test.dcm		test.dcm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

V7 Darwin Python SDK

🏁 Installation

Configuration

Retry Configuration

Development

Usage as a Command Line Interface (CLI)

Client Authentication

Listing local and remote datasets

Create/remove a dataset

Upload/download data to/from a remote dataset

Usage as a Python library

About

Uh oh!

Releases 156

Packages

Uh oh!

Contributors 44

Uh oh!

Languages

License

v7labs/darwin-py

Folders and files

Latest commit

History

Repository files navigation

V7 Darwin Python SDK

🏁 Installation

Configuration

Retry Configuration

Development

Usage as a Command Line Interface (CLI)

Client Authentication

Listing local and remote datasets

Create/remove a dataset

Upload/download data to/from a remote dataset

Usage as a Python library

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 156

Packages 0

Uh oh!

Contributors 44

Uh oh!

Languages

Packages