Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mitigate problems with EntityManager::flush() reentrance since 2.16.0 #10906

Closed
wants to merge 2 commits into from

Conversation

mpdude
Copy link
Contributor

@mpdude mpdude commented Aug 10, 2023

This PR addresses the issue brought up in #10869. It happens when users use event listeners – postPersist in particular – to make changes to entities and/or persist new entities and finally call EntityManager::flush(), while the UnitOfWork is currently within the commit phase.

There is a discussion in #10900 whether this kind of reentrance should be deprecated in 2.x and prevented in 3.x. But, in order to prevent complete breakage with the 2.16.0 update, this PR tries to apply a band-aid 🩹.

A few things changed inside the UoW commit implementation in 2.16, and for sure this PR does not restore all details of the previous behavior. Take it with a grain of salt.

Here's the details.

The issue appears when UoW::commit() is performing entity insertions, and postPersist listener causes commit() reentrance when there are pending insertions left.

In that situation, the "inner" (reentrant) call will start working through all changesets. Eventually, it finishes with all insertions being performed and UoW::$entityInsertions being empty.

The entity insertion order, an array computed at the beginning of UnitOfWork::executeInserts(), still contains entities that now have been processed already. This leads to the reported failure down the road. The mitigation is to check for this condition and skip such entities.

Before #10547 (pre-2.16), things worked a bit differently:

The UoW did not build a list of entities (objects) as the commit order, but a sequence of classes to process.

For every entity class, it would find all entity instances in UoW::$entityInsertions and pass them to the persister in a batch. The persister, in turn, would execute the INSERT statements for all those entities before it dispatched the postInsert events.

That means when a postInsert listener was notified pre-2.16, the UoW was in a state where 1) all insertions for a particular class had been written to the database already and 2) the UoW had not yet gathered the next round of entities to process.

@dmaicher
Copy link
Contributor

dmaicher commented Aug 11, 2023

@mpdude this test fails for me on 2.16.x but works on 2.15.x

I added a few more entities that are persisted so I guess the flush during postPersist messes with the persisting of the other entities following afterwards?

<?php

declare(strict_types=1);

namespace Doctrine\Tests\ORM\Functional\Ticket;

use Doctrine\DBAL\Logging\LoggerChain;
use Doctrine\ORM\Event\PostPersistEventArgs;
use Doctrine\ORM\Events;
use Doctrine\Tests\DbalExtensions\SqlLogger;
use Doctrine\Tests\Models\CMS\CmsUser;
use Doctrine\Tests\OrmFunctionalTestCase;
use function sprintf;

class GH10869Test extends OrmFunctionalTestCase
{
    protected function setUp(): void
    {
        $this->useModelSet('cms');

        parent::setUp();
    }

    public function testPostPersistListenerUpdatingObject(): void
    {
        foreach (range(1, 3) as $i) {
            $joe           = new CmsUser();
            $joe->username = "joe-$i";
            $joe->name     = "Joe $i";
            $this->_em->persist($joe);
        }

        $this->_em->getEventManager()->addEventListener(Events::postPersist, new class {
            public function postPersist(PostPersistEventArgs $args): void
            {
                $object = $args->getObject();

                if (! $object instanceof CmsUser) {
                    return;
                }

                $objectManager = $args->getObjectManager();

                if ($object->status === null) {
                    $object->status = sprintf('joe-id=%s', $object->id);
                }

                $objectManager->flush();
            }
        });

        $this->_em->flush();
        $this->_em->clear();

        $joeReloaded = $this->_em->find(CmsUser::class, $joe->id);

        self::assertNotNull($joeReloaded->status);
        self::assertSame($joe->status, $joeReloaded->status);
    }
}

@mpdude mpdude changed the title (Try to) add a reproducer for #10869 Mitigate problems with EntityManager::flush() reentrance since 2.16.0 Aug 17, 2023
This PR addresses the issue brought up in doctrine#10869. It happens when users use event listeners – `postPersist` in particular – to make changes to entities and/or persist new entities and finally call `EntityManager::flush()`, while the `UnitOfWork` is currently within the commit phase.

There is a discussion in doctrine#10900 whether this kind of reentrance should be deprecated in 2.x and prevented in 3.x. But, in order to prevent complete breakage with the 2.16.0 update, this PR tries to apply a band-aid 🩹.

A few things changed inside the UoW commit implementation in 2.16, and for sure this PR does not restore all details of the previous behavior. Take it with a grain of salt.

Here's the details.

The issue appears when `UoW::commit()` is performing entity insertions, and `postPersist` listener causes `commit()` reentrance when there are pending insertions left.

In that situation, the "inner" (reentrant) call will start working through all changesets. Eventually, it finishes with all insertions being performed and `UoW::$entityInsertions` being empty.

The entity insertion order, an array computed at the beginning of `UnitOfWork::executeInserts()`, still contains entities that now have been processed already. This leads to the reported failure down the road. The mitigation is to check for this condition and skip such entities.

Before doctrine#10547 (pre-2.16), things worked a bit differently:

The UoW did not build a list of entities (objects) as the commit order, but a sequence of _classes_ to process.

For every entity _class_, it would find all entity instances in `UoW::$entityInsertions` and pass them to the persister in a batch. The persister, in turn, would execute the `INSERT` statements for _all_ those entities _before_ it dispatched the `postInsert` events.

That means when a `postInsert` listener was notified pre-2.16, the UoW was in a state where 1) all insertions for a particular class had been written to the database already and 2) the UoW had not yet gathered the next round of entities to process.
@mpdude mpdude marked this pull request as ready for review August 17, 2023 07:45
@mpdude
Copy link
Contributor Author

mpdude commented Aug 17, 2023

@dmaicher Please check and report if this fixes the issue for you

@dmaicher
Copy link
Contributor

dmaicher commented Aug 17, 2023

@mpdude thanks for all your efforts on this 👍

The SQL error with missing parameters is indeed gone.

However I don't think the behavior is quite the same as on 2.15 😕 Now some other test expectations are failing that seem related to the new entities that are persisted within postPersist.

I think its too dangerous to flush again while another flush is ongoing. So instead of somehow fixing this I would definitely prefer a new event and do another clean flush.

here a test that fails with your changes but works on 2.15:

<?php

declare(strict_types=1);

namespace Doctrine\Tests\ORM\Functional\Ticket;

use Doctrine\ORM\Event\PostPersistEventArgs;
use Doctrine\ORM\Events;
use Doctrine\ORM\Mapping as ORM;
use Doctrine\Tests\OrmFunctionalTestCase;

use function sprintf;

class GH10869Test extends OrmFunctionalTestCase
{
    protected function setUp(): void
    {
        parent::setUp();

        $this->setUpEntitySchema([
            GH10869Entity::class,
        ]);
    }

    public function testPostPersistListenerUpdatingObjectFieldWhileOtherInsertPending(): void
    {
        $entity1 = new GH10869Entity();
        $this->_em->persist($entity1);

        $entity2 = new GH10869Entity();
        $this->_em->persist($entity2);

        $this->_em->getEventManager()->addEventListener(Events::postPersist, new class {
            public function postPersist(PostPersistEventArgs $args): void
            {
                $object = $args->getObject();

                if ($object->field === 'foo') {
                    return;
                }

                $objectManager = $args->getObjectManager();
                $object->field = sprintf('test %s', $object->id);

                foreach (range(1, 2) as $ignored) {
                    $newObject = new GH10869Entity();
                    $newObject->field = 'foo';
                    $objectManager->persist($newObject);
                }

                $objectManager->flush();
            }
        });

        $this->_em->flush();
        $this->_em->clear();

        $entity1Reloaded = $this->_em->find(GH10869Entity::class, $entity1->id);
        self::assertSame($entity1->field, $entity1Reloaded->field);

        $entity2Reloaded = $this->_em->find(GH10869Entity::class, $entity2->id);
        self::assertSame($entity2->field, $entity2Reloaded->field);

        $objectsPersistedDuringPostPersist = $this->_em->getRepository(GH10869Entity::class)->count(['field' => 'foo']);
        self::assertSame(4, $objectsPersistedDuringPostPersist);
    }
}

/**
 * @ORM\Entity
 */
class GH10869Entity
{
    /**
     * @ORM\Id
     * @ORM\GeneratedValue
     * @ORM\Column(type="integer")
     *
     * @var ?int
     */
    public $id;

    /**
     * @ORM\Column(type="text", nullable=true)
     *
     * @var ?string
     */
    public $field;
}

@mpdude
Copy link
Contributor Author

mpdude commented Aug 17, 2023

What exactly is this difference?

@dmaicher
Copy link
Contributor

What exactly is this difference?

you mean how my test case is different from yours?

--- a/tests/Doctrine/Tests/ORM/Functional/GH10869Test.php
+++ b/tests/Doctrine/Tests/ORM/Functional/GH10869Test.php
@@ -35,8 +35,19 @@ class GH10869Test extends OrmFunctionalTestCase
             {
                 $object = $args->getObject();
 
+                if ($object->field === 'foo') {
+                    return;
+                }
+
                 $objectManager = $args->getObjectManager();
                 $object->field = sprintf('test %s', $object->id);
+
+                foreach (range(1, 2) as $ignored) {
+                    $newObject = new GH10869Entity();
+                    $newObject->field = 'foo';
+                    $objectManager->persist($newObject);
+                }
+
                 $objectManager->flush();
             }
         });
@@ -49,6 +60,9 @@ class GH10869Test extends OrmFunctionalTestCase
 
         $entity2Reloaded = $this->_em->find(GH10869Entity::class, $entity2->id);
         self::assertSame($entity2->field, $entity2Reloaded->field);
+
+        $objectsPersistedDuringPostPersist = $this->_em->getRepository(GH10869Entity::class)->count(['field' => 'foo']);
+        self::assertSame(4, $objectsPersistedDuringPostPersist);
     }

the last assert basically fails. On your branch its 6 and on 2.15 its 4 which would be correct.

@mpdude
Copy link
Contributor Author

mpdude commented Aug 17, 2023

No, I mean what exactly the difference is between 2.15 and 2.16... does it miss updates? Is only the ordering different?

@dmaicher
Copy link
Contributor

No, I mean what exactly the difference is between 2.15 and 2.16... does it miss updates? Is only the ordering different?

Hard to tell so far. It seems some new objects (added during postPersist) are not even persisted at all or so. I will check in more detail.

@dmaicher
Copy link
Contributor

dmaicher commented Aug 17, 2023

@mpdude ok after further checking it has to do with the changed order of entity insertion.

In the postPersist listener I expect some related entity to be already persisted (written to DB and assigned ID) and only then I can properly create the new entity.

So I think I need a new event 😋 Too brittle to make (false as we see now) assumptions about the order in which entities are persisted.

@mpdude
Copy link
Contributor Author

mpdude commented Aug 17, 2023

Yes, you were assuming that all other entities of the same class have already been written before the event is dispatched.

Maybe we can move the postInsert event a bit back – I'll have to think this through.

@dmaicher
Copy link
Contributor

Yes, you were assuming that all other entities of the same class have already been written before the event is dispatched.

actually in my case its not even about entities of the same class 😕 Its about another related class. The structure is a bit more complex.

Its like

    A
  ´   `
B       C

I listen to postPersist for C and expect that B is also already persisted. Which luckily was the case on 2.15 but now not anymore.

@mpdude
Copy link
Contributor Author

mpdude commented Aug 17, 2023

foreach ($entities as $entity) {
$this->listenersInvoker->invoke($class, Events::postPersist, $entity, new PostPersistEventArgs($entity, $this->em), $invoke);
}

This is the 2.15 version of UnitOfWork::executeInserts(), and it moved all the events to the very end of the method. If that would be possible in 2.16 as well, it might cover this case. Plus, we probably don't need the patch suggested in this PR.

@mpdude
Copy link
Contributor Author

mpdude commented Aug 17, 2023

@dmaicher Would this approach b66c3b4 work for your use case?

@mpdude
Copy link
Contributor Author

mpdude commented Aug 17, 2023

Regarding #10906 (comment) – I cannot reproduce that. For me, it seems to pass also based on 1895cf8

@dmaicher
Copy link
Contributor

dmaicher commented Aug 18, 2023

@dmaicher Would this approach b66c3b4 work for your use case?

@mpdude this works indeed 🎉 Let me try to run the full testsuite (after fixing some tests that reply on Ids and the order of persisted entites).

EDIT: all tests green now on my end! 😊

@derrabus
Copy link
Member

PHPCS fails. 🤓

@mpdude mpdude marked this pull request as draft August 18, 2023 08:19
@mpdude
Copy link
Contributor Author

mpdude commented Aug 18, 2023

Converting to draft for the time being. Will open a new PR later since the solution we landed at is somewhat different from what I initially proposed. To keep the PR history understandable, I think it would be better to move the "new" solution to a dedicated discussion.

Extra things I'd like to include:

  • Check/fix that also postRemove (and postUpdate) should have event dispatching deferred in a similar way
  • Include test to check that postPersist is called at a time where all entities have been persisted and their IDs assigned, but not necessarily all extra updates and collections have been processed. Check with two entities of different classes cross-referencing each other.
  • Update docs and explain this for the postPersist event
  • Optionally: State that calling flush() from event listeners is discouraged (maybe dedicated PR?)

mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 21, 2023
….0 (Take 2)

The changes from doctrine#10547, which landed in 2.16.0, cause problems for users that call `EntityManager::flush()` from within `postPersist` event listeners.

* When `UnitOfWork::commit()` is re-entered, the "inner" (reentrant) call will start working through all changesets. Eventually, it finishes with all insertions being performed and UoW::$entityInsertions being empty. After return, the entity insertion order, an array computed at the beginning of `UnitOfWork::executeInserts()`, still contains entities that now have been processed already. This leads to a strange-looking SQL error where the number of parameters used does not match the number of parameters bound. This has been reported as doctrine#10869.

* The fixes made to the commit order computation may lead to a different entity insertion order than previously. `postPersist` listener code may be affected by this when accessing generated IDs for other entities than the one the event has been dispatched for. This ID may not yet be available when the insertion order is different from the one that was used before 2.16. This has been mentioned in doctrine#10906 (comment).

This PR suggests to address both issues by dispatching the `postPersist` event only after _all_ new entities have their rows inserted into the database. Likewise, dispatch `postRemove` only after _all_ deletions have been executed.

This solves the first issue because the sequence of insertions or deletions has been processed completely _before_ we start calling event listeners. This way, potential changes made by listeners will no longer be relevant.

Regarding the second issue, I think deferring `postPersist` a bit until _all_ entities have been inserted does not violate any promises given, hence is not a BC break. In 2.15, this event was raised after all insertions _for a particular class_ had been processed - so, it was never an "immediate" event for every single entity. doctrine#10547 moved the event handling to directly after every single insertion. Now, this PR moves it back a bit to after _all_ insertions.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 21, 2023
….0 (Take 2)

The changes from doctrine#10547, which landed in 2.16.0, cause problems for users calling `EntityManager::flush()` from within `postPersist` event listeners.

* When `UnitOfWork::commit()` is re-entered, the "inner" (reentrant) call will start working through all changesets. Eventually, it finishes with all insertions being performed and `UoW::$entityInsertions` being empty. After return, the entity insertion order, an array computed at the beginning of `UnitOfWork::executeInserts()`, still contains entities that now have been processed already. This leads to a strange-looking SQL error where the number of parameters used does not match the number of parameters bound. This has been reported as doctrine#10869.

* The fixes made to the commit order computation may lead to a different entity insertion order than previously. `postPersist` listener code may be affected by this when accessing generated IDs for other entities than the one the event has been dispatched for. This ID may not yet be available when the insertion order is different from the one that was used before 2.16. This has been mentioned in doctrine#10906 (comment).

This PR suggests to address both issues by dispatching the `postPersist` event only after _all_ new entities have their rows inserted into the database. Likewise, dispatch `postRemove` only after _all_ deletions have been executed.

This solves the first issue because the sequence of insertions or deletions has been processed completely _before_ we start calling event listeners. This way, potential changes made by listeners will no longer be relevant.

Regarding the second issue, I think deferring `postPersist` a bit until _all_ entities have been inserted does not violate any promises given, hence is not a BC break. In 2.15, this event was raised after all insertions _for a particular class_ had been processed - so, it was never an "immediate" event for every single entity. doctrine#10547 moved the event handling to directly after every single insertion. Now, this PR moves it back a bit to after _all_ insertions.
@mpdude
Copy link
Contributor Author

mpdude commented Aug 21, 2023

Closing in favor of #10915.

@mpdude mpdude closed this Aug 21, 2023
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 22, 2023
….0 (Take 2)

The changes from doctrine#10547, which landed in 2.16.0, cause problems for users calling `EntityManager::flush()` from within `postPersist` event listeners.

* When `UnitOfWork::commit()` is re-entered, the "inner" (reentrant) call will start working through all changesets. Eventually, it finishes with all insertions being performed and `UoW::$entityInsertions` being empty. After return, the entity insertion order, an array computed at the beginning of `UnitOfWork::executeInserts()`, still contains entities that now have been processed already. This leads to a strange-looking SQL error where the number of parameters used does not match the number of parameters bound. This has been reported as doctrine#10869.

* The fixes made to the commit order computation may lead to a different entity insertion order than previously. `postPersist` listener code may be affected by this when accessing generated IDs for other entities than the one the event has been dispatched for. This ID may not yet be available when the insertion order is different from the one that was used before 2.16. This has been mentioned in doctrine#10906 (comment).

This PR suggests to address both issues by dispatching the `postPersist` event only after _all_ new entities have their rows inserted into the database. Likewise, dispatch `postRemove` only after _all_ deletions have been executed.

This solves the first issue because the sequence of insertions or deletions has been processed completely _before_ we start calling event listeners. This way, potential changes made by listeners will no longer be relevant.

Regarding the second issue, I think deferring `postPersist` a bit until _all_ entities have been inserted does not violate any promises given, hence is not a BC break. In 2.15, this event was raised after all insertions _for a particular class_ had been processed - so, it was never an "immediate" event for every single entity. doctrine#10547 moved the event handling to directly after every single insertion. Now, this PR moves it back a bit to after _all_ insertions.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 22, 2023
….0 (Take 2)

The changes from doctrine#10547, which landed in 2.16.0, cause problems for users calling `EntityManager::flush()` from within `postPersist` event listeners.

* When `UnitOfWork::commit()` is re-entered, the "inner" (reentrant) call will start working through all changesets. Eventually, it finishes with all insertions being performed and `UoW::$entityInsertions` being empty. After return, the entity insertion order, an array computed at the beginning of `UnitOfWork::executeInserts()`, still contains entities that now have been processed already. This leads to a strange-looking SQL error where the number of parameters used does not match the number of parameters bound. This has been reported as doctrine#10869.

* The fixes made to the commit order computation may lead to a different entity insertion order than previously. `postPersist` listener code may be affected by this when accessing generated IDs for other entities than the one the event has been dispatched for. This ID may not yet be available when the insertion order is different from the one that was used before 2.16. This has been mentioned in doctrine#10906 (comment).

This PR suggests to address both issues by dispatching the `postPersist` event only after _all_ new entities have their rows inserted into the database. Likewise, dispatch `postRemove` only after _all_ deletions have been executed.

This solves the first issue because the sequence of insertions or deletions has been processed completely _before_ we start calling event listeners. This way, potential changes made by listeners will no longer be relevant.

Regarding the second issue, I think deferring `postPersist` a bit until _all_ entities have been inserted does not violate any promises given, hence is not a BC break. In 2.15, this event was raised after all insertions _for a particular class_ had been processed - so, it was never an "immediate" event for every single entity. doctrine#10547 moved the event handling to directly after every single insertion. Now, this PR moves it back a bit to after _all_ insertions.
mpdude added a commit to mpdude/doctrine2 that referenced this pull request Aug 23, 2023
….0 (Take 2)

The changes from doctrine#10547, which landed in 2.16.0, cause problems for users calling `EntityManager::flush()` from within `postPersist` event listeners.

* When `UnitOfWork::commit()` is re-entered, the "inner" (reentrant) call will start working through all changesets. Eventually, it finishes with all insertions being performed and `UoW::$entityInsertions` being empty. After return, the entity insertion order, an array computed at the beginning of `UnitOfWork::executeInserts()`, still contains entities that now have been processed already. This leads to a strange-looking SQL error where the number of parameters used does not match the number of parameters bound. This has been reported as doctrine#10869.

* The fixes made to the commit order computation may lead to a different entity insertion order than previously. `postPersist` listener code may be affected by this when accessing generated IDs for other entities than the one the event has been dispatched for. This ID may not yet be available when the insertion order is different from the one that was used before 2.16. This has been mentioned in doctrine#10906 (comment).

This PR suggests to address both issues by dispatching the `postPersist` event only after _all_ new entities have their rows inserted into the database. Likewise, dispatch `postRemove` only after _all_ deletions have been executed.

This solves the first issue because the sequence of insertions or deletions has been processed completely _before_ we start calling event listeners. This way, potential changes made by listeners will no longer be relevant.

Regarding the second issue, I think deferring `postPersist` a bit until _all_ entities have been inserted does not violate any promises given, hence is not a BC break. In 2.15, this event was raised after all insertions _for a particular class_ had been processed - so, it was never an "immediate" event for every single entity. doctrine#10547 moved the event handling to directly after every single insertion. Now, this PR moves it back a bit to after _all_ insertions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants