Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Dont reset escr tables or ignored tables in general #37

Merged
merged 1 commit into from
Nov 17, 2023

Conversation

mhsdesign
Copy link
Member

@mhsdesign mhsdesign commented Nov 14, 2023

We respect flows option "ignoredTables" to preserve certain tables while resetting the database.
In our case we interpret everything in "ignoredTables" as not managed by doctrine.
And this trait should only clear tables for @flowEntities.
An important use case is keeping the Neos ESCR tables cr_*. alive for speeding up tests.

Docs for original idea of "ignoredTables": https://flowframework.readthedocs.io/en/9.0/TheDefinitiveGuide/PartIII/Persistence.html#ignoring-tables

Implementation copied from https://github.com/neos/flow-development-collection/blob/ed6a26603f682966816c71840524c7da6ed919a5/Neos.Flow/Classes/Command/DoctrineCommandController.php#L468-L474

@mhsdesign mhsdesign force-pushed the bugfix/dontResetEscrTablesOrIgnoredTablesInGeneral branch from b7e9ca6 to f09b2b3 Compare November 14, 2023 16:54
@bwaidelich
Copy link
Member

Clever idea, but I'm not sure whether it is transparent that this ignoredTable setting is in any way related to the behavior of truncated tables in behat tests..

An important use case is keeping the Neos ESCR tables cr_*. alive for speeding up tests.

With "alive" you mean, that we don't truncate them, right?
But I think they have to be truncated anyways..
Wouldn't it be more effective to reduce the number of database schema creations, e.g. by storing self::$databaseSchema only once per behat run?

@bwaidelich
Copy link
Member

btw: maybe the generic truncateAndSetupFlowEntities should not be used by the ESCR tests at all.. We know exactly which tables we have to reset there

@mhsdesign
Copy link
Member Author

I think we are talking a bit past each other.

First of:

the generic truncateAndSetupFlowEntities should not be used by the ESCR tests at all.

yes this is true. But the critical point are Neos.Neos tests where we mix flow entities and neos escr things.
Thats why the tests are tagged with @flowEntities see https://github.com/neos/neos-development-collection/blob/1bc116a6439fcb6735496c59677e7eb8f333a696/Neos.Neos/Tests/Behavior/Features/Fusion/ContentCollection.feature#L1

Clever idea, but I'm not sure whether it is transparent that this ignoredTable setting is in any way related to the behavior of truncated tables in behat tests..

Yeah this is kindof true. I think flow should somehow tell which tables are managed by doctrine and which are not. This setting is the place we need to set either way so there are no drop statements generated for all the cr tables. I think its okay to use this filter and its the thing that works out of the box. An alternative would be to check $somehow which tables have been created from within migrations (which is kind-of impossible) and then drop only those, or revert all doctrine migrations, which will also most likely behave weird and not result in the expected state of the tables being dropped.

With "alive" you mean, that we don't truncate them, right?

yes.

But I think they have to be truncated anyways..

but we do that explicitly via $contentRepository->resetProjectionStates();.

Wouldn't it be more effective to reduce the number of database schema creations, e.g. by storing self::$databaseSchema only once per behat run?
im not sure where youre getting with this, i thought you were against caching the schema? ^^ see

The point with this change is that we can setup the content repository only once for all tests $contentRepository->setUp();.
The projections contents are cleared in each iteration via $contentRepository->resetProjectionStates(); and this works quite well for the escr core tests. But for tests using the @flowEntities annotation and database reset we must avoid to drop all the escr tables as otherwise we would need to setup the cr again.

Why are we going through so much hassle to (hackily) setup the cr only once? Because each projection calls $schemaManager->createSchema() twice when being setup and this slows down test enormously by 80%

For our slack discussion see link

mhsdesign added a commit to mhsdesign/neos-development-collection that referenced this pull request Nov 14, 2023
With this change we setup the content repository only once for all tests `$contentRepository->setUp();`.
The projections contents are cleared in each iteration via `$contentRepository->resetProjectionStates();` and this works quite well for the escr core tests.
For tests using the `@flowEntities` annotation and database reset we must avoid to drop all the escr tables as otherwise we would need to setup the cr again. This will be fixed in neos/behat#37.

Why are we going through so much hassle to (hackily) setup the cr only once? Because each projection calls `$schemaManager->createSchema()` twice when being setup and this slows down test enormously by 80%
@kitsunet
Copy link
Member

kitsunet commented Nov 14, 2023

Yeah this is kindof true. I think flow should somehow tell which tables are managed by doctrine and which are not.

Flow was always of the opinion that the database is it's, as in ORM land, so no need to tell anyone, so technically we should separate the two worlds.

@mhsdesign
Copy link
Member Author

Flow was always of the opinion that the database is it's, as in ORM land, so no need to tell anyone, so technically we should separate the two worlds.

Hmm yes and that made sense for a time i guess. But now flow has grown up and must allow also its siblings to play with the db ^^

Every now an then a project has already non-doctrine managed tables. And i remember also using the ignoredTables option to declare that to flow. So i think its fine and simplifies things if we allow them to be mixed in one database, and this hasnt been a real problem until this cr behat testing dilemma.

Another alternative would be to somehow cache the createSchema calls and dont optimize the $cr->setUp(). That would have the same performance gain, but it comes with a downside as @bwaidelich said:

There is most certainly a lot of potential for speed improvements, but we should be careful not to let them degrade the system architecture.
E.g. I would suggest not to cache the createSchema call even though it might be tempting because that can lead to caching issues (you might have heard of them).

@mhsdesign
Copy link
Member Author

And to make the thing more complex: We have the same dilemma for functional tests. The "setUp" the cr once performance optimization doesnt work here, as the PersistenceManager::tearDown will just drop the whole database:

https://github.com/neos/flow-development-collection/blob/58a58224282e5bb4212718bb6e2711649fc2d4dc/Neos.Flow/Classes/Persistence/Doctrine/PersistenceManager.php#L343

(and actually with way less code than behat does ^^)

@bwaidelich
Copy link
Member

But the critical point are Neos.Neos tests where we mix flow entities and neos escr things.
Thats why the tests are tagged with @flowEntities

That's what I suggest to change. Partly.
In the tests we know what entities/records are created so those should take care of remove their fixtures before and/or afterwards. No need to reset the whole database.

Another alternative would be to somehow cache the createSchema calls and dont optimize the $cr->setUp()

I don't think, that we have to decide between the two:
IMO it makes perfect sense to only call the cr->setup() once for the Behat run (IIRC this is not supported with lifecycle hooks by Behat itself, but @skurfuerst had added some code to do that).

But we should also try to improve the performance of the setup() as it is apparently a bottleneck!

Currently we do

$schemaDiff = (new Comparator())->compare($schemaManager->createSchema(), $schema);
foreach ($schemaDiff->toSaveSql($connection->getDatabasePlatform()) as $statement) {
    $connection->executeStatement($statement);
}

(which is deprecated with doctrine/dbal 3+ by the way)

If I got it right, the problematic part is $schemaManager->createSchema() because it has to go through all tables and their columns and indexes – even though we are only interested in one or two tables per projection.
I'll try to find faster ways to achieve the same!

@kitsunet
Copy link
Member

Doctrine\DBAL\Schema\Comparator::diffTable and $schemaManager->introspectTable('tableName') could be a direction?

@mhsdesign
Copy link
Member Author

Yes that would be great. I had tunnel vision and thought that was the only way 😂 even though i knew we use toSaveSql as toSql will create a lot of drop statements for all those tables that werent in the schema.

@bwaidelich
Copy link
Member

I just fiddled a bit with the whole schema diffing, and just changing

$schema = $schemaManager->createSchema();

to

$schema = new Schema([$schemaManager->listTableDetails($tableName)], [], $schemaManager->createSchemaConfig());

reduced the runtime of a single instance from 0.08 s to 0.0012 s.
This seems like a very quick win.. And IMO that code could safely be runtime-cached because it only affects the given table(s)

@mhsdesign
Copy link
Member Author

That is perfect. But regarding this behat package i still think this is a clever idea to respect the "ignoredTables" so that @flowEntities doesnt dump the whole database. That would allow to make this trait more versatile and also usable for tables in your own tests that are not doctrine managed.

We could also rename the option persistence.doctrine.migrations.ignoredTables to be more versatile but imo its not worth for this edgecase. Alternatively flow could provide a testing helper (which would also be used for PersistenceManager::tearDown) that drops the appropriate parts of the database and itself will consider the ignoredTables option.

In the tests we know what entities/records are created so those should take care of remove their fixtures before and/or afterwards. No need to reset the whole database.

this is easier said than done. For example we create entities at many unexpected places. The routing trait creates a site for example, but what if another trait also allows the site creation?. Using the $repository->removeAll in random orders across traits will almost certainly fail due to constraints. And for functional tests the same applies.

I think for testing flow should only reset doctrine managed parts.

@bwaidelich
Copy link
Member

That is perfect. But regarding this behat package i still think this is a clever idea to
respect the "ignoredTables" so that @flowEntities doesnt dump the whole database

Yes, I agree. It's called truncateAndSetupFlowEntities – which is a really weird name, but at least it is about entities and those most probably won't be part of tables that are marked ignored.

We could also rename the option persistence.doctrine.migrations.ignoredTables

No no, that was not my point :)

this is easier said than done. For example we create entities at many unexpected places.
The routing trait creates a site for example, but what if another trait also allows the site creation?

I don't think that this is a problem at all. Just make sure that for every step that creates an entity a corresponding @AfterScenario removes those. Preferably in the same trait so it's not done multiple times, but even if only the first call would truncate and the following would be no-ops essentially.
If this leads to errors due to constraints, we can make it more fine grained. But I don't think that it will be an issue because we know exactly what code is being executed during one step.

I think for testing flow should only reset doctrine managed parts

And I think, Flow should not reset anything but tests that create entities should take care of cleaning up after themselves :)

Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's fine to go with that for now, but personally I would prefer not to use truncateAndSetupFlowEntities in the long run

Classes/FlowEntitiesTrait.php Show resolved Hide resolved
@kitsunet kitsunet merged commit 9c82357 into 9.0 Nov 17, 2023
@kitsunet kitsunet deleted the bugfix/dontResetEscrTablesOrIgnoredTablesInGeneral branch November 17, 2023 16:40
mhsdesign added a commit to mhsdesign/neos-development-collection that referenced this pull request Nov 17, 2023
With this change we setup the content repository only once for all tests `$contentRepository->setUp();`.
The projections contents are cleared in each iteration via `$contentRepository->resetProjectionStates();` and this works quite well for the escr core tests.
For tests using the `@flowEntities` annotation and database reset we must avoid to drop all the escr tables as otherwise we would need to setup the cr again. This will be fixed in neos/behat#37.

Why are we going through so much hassle to (hackily) setup the cr only once? Because each projection calls `$schemaManager->createSchema()` twice when being setup and this slows down test enormously by 80%
neos-bot pushed a commit to neos/contentrepository-testsuite that referenced this pull request Nov 18, 2023
With this change we setup the content repository only once for all tests `$contentRepository->setUp();`.
The projections contents are cleared in each iteration via `$contentRepository->resetProjectionStates();` and this works quite well for the escr core tests.
For tests using the `@flowEntities` annotation and database reset we must avoid to drop all the escr tables as otherwise we would need to setup the cr again. This will be fixed in neos/behat#37.

Why are we going through so much hassle to (hackily) setup the cr only once? Because each projection calls `$schemaManager->createSchema()` twice when being setup and this slows down test enormously by 80%
neos-bot pushed a commit to neos/contentrepositoryregistry-testsuite that referenced this pull request Nov 18, 2023
With this change we setup the content repository only once for all tests `$contentRepository->setUp();`.
The projections contents are cleared in each iteration via `$contentRepository->resetProjectionStates();` and this works quite well for the escr core tests.
For tests using the `@flowEntities` annotation and database reset we must avoid to drop all the escr tables as otherwise we would need to setup the cr again. This will be fixed in neos/behat#37.

Why are we going through so much hassle to (hackily) setup the cr only once? Because each projection calls `$schemaManager->createSchema()` twice when being setup and this slows down test enormously by 80%
neos-bot pushed a commit to neos/neos that referenced this pull request Nov 18, 2023
With this change we setup the content repository only once for all tests `$contentRepository->setUp();`.
The projections contents are cleared in each iteration via `$contentRepository->resetProjectionStates();` and this works quite well for the escr core tests.
For tests using the `@flowEntities` annotation and database reset we must avoid to drop all the escr tables as otherwise we would need to setup the cr again. This will be fixed in neos/behat#37.

Why are we going through so much hassle to (hackily) setup the cr only once? Because each projection calls `$schemaManager->createSchema()` twice when being setup and this slows down test enormously by 80%
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants