DEP: | 0014 |
---|---|
Author: | Jake Howard |
Implementation Team: | Jake Howard |
Shepherd: | Carlton Gibson |
Status: | Accepted |
Type: | Feature |
Created: | 2024-02-07 |
Last-Modified: | 2024-05-13 |
Table of Contents
Whilst Django is a web framework, there's more to web applications than just the request-response lifecycle. Sending emails, communicating with external services or running complex actions should all be done outside the request-response cycle.
Django doesn't have a first-party solution for long-running tasks, however the ecosystem is filled with incredibly popular frameworks, all of which interact with Django in slightly different ways. Other frameworks such as Laravel have background workers built-in, allowing them to push tasks into the background to be processed at a later date, without requiring the end user to wait for them to occur.
Library maintainers must implement support for any possible task backend separately, should they wish to offload functionality to the background. This includes smaller libraries, but also larger meta-frameworks with their own package ecosystem such as Wagtail.
This proposal sets out to provide an interface and base implementation for long-running background tasks in Django.
The proposed implementation will be in the form of an application wide "task backend" interface(s). This backend will be what connects Django to the task runners with a single pattern. The task backend will provide an interface for either third-party libraries, or application developers to specify how tasks should be created and pushed into the background.
Alongside this interface, Django will provide a few built-in backends, useful for testing, local development and production use cases.
A backend will be a class which extends a Django-defined base class, and provides the common interface between Django and the underlying task runner.
from datetime import datetime
from typing import Callable
from django.tasks import Task, TaskResult
from django.tasks.backends.base import BaseTaskBackend
class MyBackend(BaseTaskBackend):
task_class = Task
def __init__(self, settings_dict: dict[str, Any]) -> None:
"""
Any connections which need to be setup can be done here
"""
super().__init__(settings_dict)
@classmethod
def validate_task(cls, task: Task) -> None:
"""
Determine whether the provided task is one which can be executed by the backend.
"""
...
def enqueue(self, task: Task, *args, **kwargs) -> TaskResult:
"""
Queue up a task to be executed
"""
...
def get_result(self, result_id: str) -> TaskResult:
"""
Retrieve a result by its id (if one exists).
If one doesn't, raises ResultDoesNotExist.
"""
...
def close(self) -> None:
"""
Close any connections opened as part of the constructor
"""
...
BaseTaskBackend
will provide asynchronous, a
-prefixed versions of enqueue
and get_result
using asgiref.sync_to_async
.
validate_task
determines whether the provided Task
is valid for the backend. This can be used to prevent coroutines from being executed, or otherwise validate the callable. If the provided task is invalid, it will raise InvalidTaskError
.
If a backend cannot support deferred tasks (ie passing the run_after
argument), it should raise InvalidTaskError
. The supports_defer
method can be used to determine whether the backend supports deferring tasks.
Django will ship with the following implementations:
- ImmediateBackend
- This backend runs the tasks immediately, rather than offloading to a background process. This is useful both for a graceful transition towards background workers, but without impacting existing functionality.
- DatabaseBackend
- This backend uses the Django ORM as a task store. This backend will support all features, and should be considered production-grade.
- DummyBackend
- This backend doesn't execute tasks at all, and instead stores the
Task
objects in memory. This backend is mostly useful in tests.
A Task
is the action which the task runner will execute. It is a class which holds a callable and some defaults for enqueue
.
Backend implementors aren't required to implement their own Task
, but may for additional functionality.
from datetime import datetime
from typing import Callable, Self
from django.tasks import Task, TaskResult
class MyBackendTask(Task):
priority: int | None
"""The priority of the task"""
func: Callable
"""The task function"""
queue_name: str | None
"""The name of the queue the task will run on """
backend: str
"""The name of the backend the task will run on"""
run_after: datetime | None
"""The earliest this task will run"""
def using(self, priority: int | None = None, queue_name: str | None = None, run_after: datetime | timedelta | None = None) -> Self:
"""
Create a new task with modified defaults
"""
...
def enqueue(self, *args, **kwargs) -> TaskResult:
"""
Queue up the task to be executed
"""
...
def get_result(self, result_id: str) -> Self:
"""
Retrieve a result for a task of this type by its id (if one exists).
If one doesn't, or is the wrong type, raises ResultDoesNotExist.
"""
...
A Task
is created by decorating a function with @task
:
from django.tasks import task
@task()
def do_a_task(*args, **kwargs):
pass
A Task
can only be created for module-level callables, so that they can be re-imported in the task runner. The task will be validated against the backend's validate_task
during construction.
If a task doesn't define a backend, it is assumed it will only use the default backend.
@task
may be used on functions or coroutines. It will be up to the backend to determine whether coroutines are supported. Support for coroutine tasks can be determined with the supports_coroutine_tasks
method on the backend.
Task arguments must be JSON serializable, to avoid compatibility and versioning issues. Complex arguments should be converted to a format which is JSON-serializable.
The using
method returns a clone of the task with the given attributes modified. This allows modification of the task before calling enqueue
. run_after
cannot be passed to @task
, and can only be configued with using
.
A TaskResult
is used as a handle to the running task, and contains useful information the application may need when referencing the execution of a Task
.
A TaskResult
is obtained either when scheduling a task function, or by calling get_result
on the backend. If called with a task_id
which doesn't exist, a TaskDoesNotExist
exception is raised.
Backend implementors aren't required to implement their own TaskResult
, but may for additional functionality.
from datetime import datetime
from typing import Any, Callable
from django.tasks import TaskResult, ResultStatus, Task
class MyBackendTaskResult(TaskResult):
task: Task
"""The task for which this is a result"""
id: str
"""A unique identifier for the task result"""
status: ResultStatus
"""The status of the running task"""
args: tuple[Any, ...]
"""The arguments to pass to the task function"""
kwargs: dict[str, Any]
"""The keyword arguments to pass to the task function"""
backend: str
"""The name of the backend the task will run on"""
result: Any
"""The return value from the task"""
def refresh(self) -> None:
"""
Reload the cached task data from the task store
"""
...
A TaskResult
will cache its values, relying on the user calling refresh
to reload the values from the task store. An asynchronous version of refresh
is automatically provided by TaskResult
using asgiref.sync_to_async
.
A TaskResult
's status
must be one of the following values (as defined by an enum
):
NEW: | The task has been created, but hasn't started running yet |
---|---|
RUNNING: | The task is currently running |
FAILED: | The task failed |
COMPLETE: | The task is complete, and the result is accessible |
If a backend supports more than these statuses, it should compress them into one of these.
For convenience, calling a Task
will execute the task's function directly, which allows for graceful transitioning towards background tasks:
from django.tasks import task
@task()
def do_a_task(*args, **kwargs):
pass
# Calls `do_a_task` as if it weren't a task
do_a_task()
Tasks can be queued using the enqueue
method, which in turn calls enqueue
on the task backend:
from django.tasks import task
@task(priority=1)
def do_a_task(*args, **kwargs):
pass
# Submit the task function to be run
result = do_a_task.enqueue()
# Optionally, provide arguments
result = do_a_task.enqueue(1, two="three")
# Override the priority defined by the `Task`
result = do_a_task.using(priority=10).enqueue()
# The modified task can be saved and reused
do_a_high_priority_task = do_a_task.using(priority=20)
for i in range(5):
do_a_high_priority_task.enqueue(i)
When multiple task backends are configured, each can be obtained from a global tasks
connection handler. Whilst it's unlikely multiple backends will be configured for a single project, support is available.
from django.tasks import tasks, task
@task()
def do_a_task(*args, **kwargs):
pass
# Submit the task function to be run
result = tasks["special"].enqueue(do_a_task)
# Optionally, provide arguments
result = tasks["special"].enqueue(do_a_task, 1, two="three")
# Alternatively
result = do_a_task.using(backend="special").enqueue(1, two="three")
Whilst this API is available, it's best to call enqueue
on the Task
directly instead and configure the backend using the backend
argument.
If a Task
is defined to run on a different backend, InvalidTaskError
is raised.
Tasks may also be "deferred" to run at a specific time in the future, by passing the run_after
argument:
from django.utils import timezone
from datetime import timedelta
# Run the task at a specific time.
result = do_a_task.using(run_after=timezone.now() + timedelta(minutes=5)).enqueue()
# Or, pass the `timedelta` directly.
result = do_a_task.using(run_after=timedelta(minutes=5)).enqueue()
run_after
must be a timedelta
or timezone-aware datetime
.
When deferring a task, it may not be exactly that time a task is executed, however it should be accurate to within a few seconds. This will depend on the current state of the queue and task runners, and is out of the control of Django.
One of the easiest and most common places that offloading work to the background can be performed is sending emails. Sending an email requires communicating with an external, potentially third-party service, which adds additional latency and risk to web requests. These can be easily offloaded to the background.
Django will ship with an additional task-based SMTP email backend, configured identically to the existing SMTP backend. The other backends included with Django don't benefit from being moved to the background.
Backends may also provide an asynchronous interface for task enqueueing, using a
-prefixed methods:
await do_a_task.aenqueue()
await do_a_task.using(priority=10).aenqueue()
Similarly, backends may support enqueueing coroutines:
from django.tasks import task
@task()
async def do_an_async_task():
pass
await do_an_async_task.aenqueue()
# Also works
do_an_async_task.enqueue()
TASKS = {
"default": {
"BACKEND": "django.tasks.backends.ImmediateBackend",
"QUEUES": []
"OPTIONS": {}
}
}
QUEUES
contains a list of valid queue names for the backend. If a task is queued to a queue which doesn't exist, an exception is raised. If omitted or empty, any name is valid.
Having a first-party interface for background workers poses 2 main benefits:
Firstly, it lowers the barrier to entry for offloading computation to the background. Currently, a user needs to research different worker technologies, follow their integration tutorial, and modify how their tasks are called. Instead, a developer simply needs to install the dependencies, and work out how to run the background worker. Similarly, a developer can start determining which actions should run in the background before implementing a true background worker, and avoid refactoring should the backend change over time.
Secondly, it allows third-party libraries to offload some of their execution. Currently, library maintainers need to either accept their code will run inside the request-response lifecycle, or provide hooks for application developers to offload actions themselves. This can be particularly helpful when offloading certain expensive signals.
One of the key benefits behind background workers is removing the requirement for the user to wait for tasks they don't need to, moving computation and complexity out of the request-response cycle, towards dedicated background worker processes. Moving certain actions to be run in the background not only improves performance of web requests, but also allows those actions to run on specialised hardware, potentially scaled differently to the web servers. This presents an opportunity to greatly decrease the percieved execution time of certain common actions performed by Django projects.
The target audience for DatabaseBackend
and a SQL-based queue are likely fairly well aligned with those who may choose something like PostgreSQL FTS over something like ElasticSearch. ElasticSearch is probably better for those 10% of users who really need it, but doesn't mean the other 90% won't be perfectly happy with PostgreSQL, and probably wouldn't benefit from ElasticSearch anyway.
The most obvious alternative to this DEP would be to standardise on a task implementation and vendor it in to Django. The Django ecosystem is already full of background worker libraries, eg Celery and RQ. Writing a production-ready task runner is a complex and nuanced undertaking, and discarding the work already done is a waste.
This proposal doesn't seek to replace existing tools, nor add yet another option for developers to consider. The primary motivation is creating a shared API contract between worker libraries and developers. It does however provide a simple way to get started, with a solution suitable for most sizes of projects (DatabaseBackend
). Slowly increasing features, adding more built-in storage backends and a first-party task runner aren't out of the question for the future, but must be done with careful planning and consideration.
This proposed implementation specifically doesn't assume anything about the user's setup. This not only reduces the chances of Django conflicting with existing task systems a user may be using (eg Celery, RQ), but also allows it to work with almost any hosting environment a user might be using.
This proposal started out as Wagtail RFC 72, as it was becoming clear a unified interface for background tasks was required, without imposing on a developer's decisions for how the tasks are executed. Wagtail is run in many different forms at many different scales, so it needed to be possible to allow developers to choose the backend they're comfortable with, in a way which Wagtail and its associated packages can execute tasks without assuming anything of the environment it's running in.
The API design has been intentionally designed with type-safety in mind, including support for statically validating task arguments. Using Task.enqueue
allows its arguments to be statically typed, and using
allows the Task
to be immutable (much like QuerySet
). Types should be able to flow from the task function, through the Task
and eventually to the TaskResult
.
So that library maintainers can use this integration without concern as to whether a Django project has configured background workers, the default configuration will use the ImmediateBackend
. Developers on older versions of Django but who need libraries which assume tasks are available can use the reference implementation, which will serve as a backport and be API-compatible with Django.
For users who need newer libraries which require this interface, but can't update Django itself, the reference implementation can be used. Users can use either django_tasks.task
or django.tasks.task
to register a task, which is usable with any configured backend, regardless of its source.
A reference implementation (django_tasks
) is being developed alongside this DEP process. This implementation will serve as an "early-access" demo to get initial feedback and start using the interface, as the basis for the integration with Django itself, but also as a backport for users of supported Django versions prior to this work being released.
The reference implementation can be found at https://github.com/RealOrangeOne/django-core-tasks, along with its progression.
The field of background tasks is vast, and attempting to implement everything supported by existing tools in the first iteration is futile. The following functionality has been considered, and deemed explicitly out of scope of the first pass, but still worthy of future development:
- Completion / failed hooks, to run subsequent tasks automatically
- Bulk queueing
- Automated task retrying
- A generic way of executing task runners. This will remain the responsibility of the underlying implementation, and the user to execute correctly.
- Observability into task queues, including monitoring and reporting
- Cron-based scheduling
- Task timeouts
- Swappable argument serialization (eg pickle)
This document has been placed in the public domain per the Creative Commons CC0 1.0 Universal license (http://creativecommons.org/publicdomain/zero/1.0/deed).