Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deferred computed fields? #40

Closed
jerch opened this issue Jun 4, 2020 · 6 comments
Closed

Deferred computed fields? #40

jerch opened this issue Jun 4, 2020 · 6 comments
Labels
backlog enhancement New feature or request

Comments

@jerch
Copy link
Collaborator

jerch commented Jun 4, 2020

Given that you have a quite expensive drilldown to do for a computed field with lots of waits (CPU hungry, waiting for DB or some other resource), the current realtime / live-sync approach of django-computedfields is quite limiting.

There are ways to implement deferred handling yourself, e.g. with a sentinel computed field indicating a dirty state:

class X(models.Model):
    longtaking = Field(...)

    @computed(BooleanField(default=True), depends=[
        # contains all dependencies to exec
        # self.calc_expensive
        ...
    ])
    def longtaking_dirty(self):
        return getattr(self, '_dirty', True)

    def calc_longtaking(self):
        ...
        self.longtaking = ...
        self._dirty = False

Here the computed field just updates the dirty state based on changed dependencies of the longtaking calculation, while the computation itself can be deferred to update things in the future:

# somewhat in the future
for el in X.objects.filter(longtaking_dirty=True):
    el.calc_longtaking()
    # update field content and its dirty state
    el.save(update_fields=['longtaking', 'longtaking_dirty'])

Wouldn't it be more convenient to have something like that:

@computed_deferred(...)
# or
@computed_async(...)

that automatically deals with the dirty state handling and postponed updates by itself or some scheduled update rules?

Just a rough idea atm, also it is already clear, that those would have to be handled quite differently internally:

  • async computed fields may depend on sync ones, but not vice versa
  • needs separate dirty field tracking with separate update paths / map
  • prolly needs some custom signals to inform about finally updated stuff
  • maybe needs some custom update scheduler

Well thats quite a lot of diverging behavior, not sure if that should be done in this package at all. On the proside it would still share the deps resolving work with sync computed fields and open the package to really complicated nasty tasks in the method code.

@jerch jerch added the enhancement New feature or request label Jun 4, 2020
@mobiware
Copy link
Contributor

@jerch a similar use case that may also be addressed by deferred computed fields is non-persisted computed fields. That may sound weird, but imagine we want to index some models and the indexed value depend on related models. Thanks to the dependency resolving capabilities of DCF we can get notified when a field on a related model changes that impacts indexing and reindex that model object, but we don't necessarily need to persist the computed field values in DB for that

@jerch
Copy link
Collaborator Author

jerch commented Jun 23, 2020

@mobiware This would be like a trigger executing stuff on related data changes. Indeed an interesting idea.

@jerch jerch added the backlog label Apr 12, 2022
@jerch jerch closed this as completed Apr 12, 2022
@verschmelzen
Copy link

AFAIU this is not yet implemented, is it?

I need such feature in my project because for us running updates on post_save is a performance issue. And also it might be a logic al issue since we have "setup" stages where parts of objects are unavailable and thus we need to "suspend" any signals that might require not yet initialized fields. So we need more control over the execution flow.

How i see the interaction with the library is:

  1. Opt-out of all default change management library provides.
  2. Make database changes, record all changes made.
  3. Call the library with changes done and receive back plan/queryset with .update()/.save() method. This step might lock the dependent records in the database, but not yet "execute" the plan.
  4. Call .save()/execute to save the changes. This can be hooked by us manually to the transaction end with on_commit or by any other means.

(3) and (4) probably should be single step - it is hard for me to imagine case where we would need to pre-lock records. But i keep it as a possibility here.

We have lots and lots of logic in django signal handlers and considering this as a good alternative. Are there any plans to implement more control over the execution and change detection? Would you be able to review and accept PR's?

cc: @jerch

@jerch
Copy link
Collaborator Author

jerch commented Oct 3, 2024

AFAIU this is not yet implemented, is it?

Yepp, nothing implemented in this regard. It is currently on the backlog, as it seemed of low interest.

From your problem description I am not quite sure, if this will help you at all. But maybe it is just me - could you give a short yet more explanatory example of your task structures and where async offloading comes into play?

@verschmelzen
Copy link

@jerch sorry for taking so long to answer. My case is pobably not related to the OPs request, since my case is synchronous.

If your are interested this is how we adopted this library into our codebase.

  1. Disable handling of updated models before transaction.atomic() block completion (basically copy your django app definition, but remove signal handlers there)
# proj/custom_computedfields_app.py

import sys

from computedfields.resolver import BOOT_RESOLVER
from computedfields.settings import settings
from django.apps import AppConfig
from django.db.models.signals import class_prepared


class ComputedfieldsConfig(AppConfig):
    name = 'computedfields'

    def __init__(self, *args, **kwargs):
        super(ComputedfieldsConfig, self).__init__(*args, **kwargs)
        class_prepared.connect(BOOT_RESOLVER.add_model)
        self.settings = settings


    def ready(self):
        # disconnect model discovery to avoid resolver issues with models created later at runtime
        class_prepared.disconnect(BOOT_RESOLVER.add_model)

        # do not run graph reduction in migrations and own commands,
        # that deal with it in their own specific way
        for token in ('makemigrations', 'migrate', 'help', 'rendergraph', 'createmap'):
            if token in sys.argv:  # pragma: no cover
                BOOT_RESOLVER.initialize(True)
                return

        # normal startup
        BOOT_RESOLVER.initialize()

        # connect signals
        from computedfields.handlers import (
            get_old_handler,
            m2m_handler,
            postdelete_handler,
            postsave_handler,
            predelete_handler,
        )
        from django.db.models.signals import m2m_changed, post_delete, post_save, pre_delete, pre_save

        # need to run those manually
        # pre_save.connect(
        #     get_old_handler, sender=None, weak=False, dispatch_uid='COMP_FIELD_PRESAVE')
        # post_save.connect(
        #     postsave_handler, sender=None, weak=False, dispatch_uid='COMP_FIELD')
        pre_delete.connect(
            predelete_handler, sender=None, weak=False, dispatch_uid='COMP_FIELD_PREDELETE')
        post_delete.connect(
            postdelete_handler, sender=None, weak=False, dispatch_uid='COMP_FIELD_POSTDELETE')
        m2m_changed.connect(
            m2m_handler, sender=None, weak=False, dispatch_uid='COMP_FIELD_M2M')
  1. then hook our own completion handler into the end of the transaction. something like this:
def update_computedfields():
    # breakpoint()
    # TODO: deletions
    by_model_class = defaultdict(lambda: defaultdict(list))
    for model in changes.changes: # NOTE:  it is our thread-local that is collecting all the changes during transaction to batch-process side effects and process notifications in the end
        by_model_class[type(model)][model.pk] = model
    for model_cls, pks in by_model_class.items():
        queryset = model_cls.objects.filter(pk__in=pks.keys())
        update_dependent(queryset)


@contextlib.contextmanager
def atomic(using=None, savepoint=True, durable=False):  #  NOTE: our custom transaction.atomic
    connection = get_connection(using)
    try:
        add_handler = False
        if not connection.in_atomic_block:
            add_handler = True
        with Atomic(using, savepoint, durable):  # NOTE: original djangos atomic block
            if add_handler:
                on_commit(update_computedfields, robust=False) # NOTE: run our handler on the end of 
            yield
    finally:
        ,,,

so we wanna collect list of changed stuff and defer graph update untill end of transaction, because we have a lot of "broken" intermediate state during transaction

@jerch
Copy link
Collaborator Author

jerch commented Nov 19, 2024

@verschmelzen Yupp looks still pretty synchronous to me. Though I wonder, why the normal callstack did not work for your case - was that too computational-heavy?

It is pretty clear, that the computation will explode for complicated dependencies, esp. when done with sequential single model object changes. There are some ideas to overcome this by temporarily disabling the recalculation and do batch calcs at the end, but that introduces its own set of issues, esp. around proper change tracking (see #148).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backlog enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants