-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
StorageBackend
: Add the initialise
method
#5760
StorageBackend
: Add the initialise
method
#5760
Conversation
@ltalirz this is an alternative to #5758 . By adding the |
Pinging @chrisjsewell in case you are interested to comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks a lot @sphuber , I think these changes make the codebase clearer.
just some minor questions/comments
aiida/storage/sqlite_temp/backend.py
Outdated
@@ -70,6 +70,10 @@ def version_head(cls) -> str: | |||
def version_profile(cls, profile: Profile) -> str | None: # pylint: disable=unused-argument | |||
return get_schema_version_head() | |||
|
|||
@classmethod | |||
def initialise(cls, profile: 'Profile', reset: bool = False) -> None: | |||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for me to understand: why is initialisation and migration a no-op for this storage backend?
Should it not also raise a NotImplementedError
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't know. I just replicated the behavior for the migrate
method. These backends were added by @chrisjsewell and are mostly used for export archives, which are initialised manually maybe in the export code. That being said, I think this means this will fail when users try to initialise a profile of this storage type through verdi setup
. But then again, that was already the case. This PR is not changing any of that. Might want to open an issue to see whether this should be possible.
with self._connection_context() as connection: | ||
if inspect(connection).has_table(self.alembic_version_tbl_name): | ||
for table_name in ( | ||
'db_dbgroup_dbnodes', 'db_dbgroup', 'db_dblink', 'db_dbnode', 'db_dblog', 'db_dbauthinfo', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just checking whether this list of tables can either be deduced from somewhere (or, if not, whether it should be stored in a reusable constant)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure to be honest. There is one similar list in the PsqlDosBackend._clear
method minus the db_dbsetting
table. I tried to change as little as possible and just get the minimum that is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeh it should likely be deduced by reflection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Also changed the backend to call the implementation of the migrator, since that was also hard-coding the tables in the _clear
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have isolated the changes that I think are responsible. Just pushed the commit without those changes to see if the test times return to normal. Then I will add the isolated changes in a separate commit and run again. That should give us a good sense whether the change in time is real. Also easier to review the changes and spot a mistake in connection handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeh, I remember there being issues with reflection and opening/closing sessions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, without the table delete changes, test times are normal (~13-14 mins): https://github.com/sphuber/aiida-core/actions/runs/3479079616/jobs/5817199424
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, I remember there being issues with reflection and opening/closing sessions
The initial change I added of using reflection caused the tests to hang. I tracked this down to reflection indeed opening a session and so the _clear
call of the next test would hang, since there was an outstanding connection. I could fix this by explicitly closing the session in _clear
. However, I think there may still be a problem with the handling of connections in PsqlDosMigrator
. Many methods need a connection and they will just open one through the _connection_context
method. However, it doesn't seem that these ever get closed. The opened connection is simply returned and not yielded and then closed at the end of the context manager. I will push a commit to add explicit closing and see if this fixes the "too many active connections" problem. It will probably not fix the increase in time though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, when adding the delete_tables
method the tests for Python 3.8 run in reasonably normal time, with tests for Python 3.10 being slightly longer at ~15-16 minutes. Before I saw ~20 minutes so think there is quite some variability between runs independent of changes.
However, the tests do fail due to too many connections being opened (only for Python 3.8 though, but this could also be random potentially). Will now push a tentative fix for this and see if that resolves things.
aiida/storage/sqlite_zip/backend.py
Outdated
@@ -78,6 +78,10 @@ def create_profile(path: str | Path, options: dict | None = None) -> Profile: | |||
def version_profile(cls, profile: Profile) -> Optional[str]: | |||
return read_version(profile.storage_config['path'], search_limit=None) | |||
|
|||
@classmethod | |||
def initialise(cls, profile: 'Profile', reset: bool = False) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be implemented in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why, given that this backend is read-only and so it cannot be used for creating a normal profile through verdi setup
nor used as a test profile, which are the only two places where initialise
is called. But I implemented it regardless.
8586378
to
a3e994f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All relevant points from my side have been addressed (as far as I understand the sqlite_temp
backend initialisation is indeed no change in behavior, and so it does not need to be fixed here, although it would be good to address eventually).
a3e994f
to
637f359
Compare
637f359
to
fe57fd4
Compare
@chrisjsewell all comments addressed. Please give it a review |
6f3830c
to
e4d4dba
Compare
So it seems the refactoring of the reflection on the schema to remove the hardcoding of the table names needs some more work. But this hard-coding was already present in the code and is not something that is changed by this PR, I propose to move this to a separate PR in order to unblock this one. |
8226e70
to
77185fa
Compare
I have opened #5769 to track the problem and will move the last two commits I had here with the partial implementation to a separate PR where we can work on working out the final kinks. |
Sounds good to me! |
77185fa
to
ff667a6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please trigger the nightly tests. Since this contains all the migration tests, which may well be affected by this PR
Done, and they pass |
The current interface of the `StorageBackend` provides two method to control the initialisation and lifetime of a storage: * `migrate` * `_clear` The `migrate` method will initialise a completely new storage or migrate an already initialised backend to the latest schema version, whereas the `_clear` method will clear all data from the storage. In reality, the current implementation of `_clear` will actually completely destroy the container and recreate it from scratch, which causes a new repository UUID to be generated. Conceptually, but also practically, it is beneficial to restructure this design slightly. When using storage for a testing profile, it may be necessary to completely reset a storage backend and reinitialise it. Therefore the `initialise` method is added to the `StorageBackend` abstract base class. This method replaces the `migrate` method as the method to be called when a new storage backend is created and needs to be initialised. It should make sure that at the end of the initialisation the storage is at the latest schema version, i.e., by calling `migrate` internally. The method takes an argument `reset`, which is `False` by default, but when set to `True`, the storage backend should be completely reset before reinitialising. Typically this entails that the storage is completely destroyed and reinitialised in a fresh state. This is used by the `TemporaryProfileManager` test utility class when a new temporary profile is created. Although the tests in `aiida-core` essentially always use the `PsqlDosBackend` and so the Postgres database will always have the correct initialisation, storage backends are now pluginable and in plugins we may want to switch between storage backends within a test suite, in which case the storage backends needs to be fully reinitialised.
This allows the workflow to be triggered manually. This is useful if a PR is created that doesn't trigger the normal conditions, for example the files marked in the `pull_request` trigger are not modified, but we still want to verify that the changes pass the tests.
319a337
to
198e9d2
Compare
The current interface of the
StorageBackend
provides two method tocontrol the initialisation and lifetime of a storage:
migrate
_clear
The
migrate
method will initialise a completely new storage or migratean already initialised backend to the latest schema version, whereas the
_clear
method will clear all data from the storage. In reality, thecurrent implementation of
_clear
will actually completely destroy thecontainer and recreate it from scratch, which causes a new repository
UUID to be generated.
Conceptually, but also practically, it is beneficial to restructure this
design slightly. When using storage for a testing profile, it may be
necessary to completely reset a storage backend and reinitialise it.
Therefore the
initialise
method is added to theStorageBackend
abstract base class. This method replaces the
migrate
method as themethod to be called when a new storage backend is created and needs to
be initialised. It should make sure that at the end of the initialisation
the storage is at the latest schema version, i.e., by calling
migrate
internally.
The method takes an argument
reset
, which isFalse
by default, butwhen set to
True
, the storage backend should be completely reset beforereinitialising. Typically this entails that the storage is completely
destroyed and reinitialised in a fresh state.
This is used by the
TemporaryProfileManager
test utility class whena new temporary profile is created. Although the tests in
aiida-core
essentially always use the
PsqlDosBackend
and so the Postgres databasewill always have the correct initialisation, storage backends are now
pluginable and in plugins we may want to switch between storage backends
within a test suite, in which case the storage backends needs to be fully
reinitialised.