-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up creation of Nodes in the AiiDA ORM #2214
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ltalirz great!!
Can I suggest to add a step in travis (not necessarily a test, maybe an additional test) that runs the speed test (e.g. for creating 10k nodes, and for running say 10x a simple verdi command that a) require load_dbenv and b) does not require it, and output some timings? This might be great in the long term to go back in history and monitor when something broke speedup. Moreover, in Jenkins one can output "artifacts" that are stored, maybe also in travis there is something similar. If you don't have the time to do it but you're ok with this, feel free to copy this text in an issue (and even assign me).
I realize tests are falling, there is at least one relative import that has not been moved |
For performance tests, I opened #2215 This PR is still a little bit dirty and before I move forward I have a few questions, perhaps also for @sphuber or @muhrin :
|
Dear Leo, thanks for the work - looks good. My response:
|
@muhrin thanks a lot for your thoughts! Re 2.: Ok. So, the idea is that when you load a new profile (when this becomes possible...), the AiiDAManager cache will be reset, correct? I will add this to its docstring. Re 3.: Could you elaborate on this a bit? In 1. you wrote that the
Furthermore, is there currently anything that invalidates the cache? |
|
This holds even when calling with the default backend, correct? I.e. caching stuff inside
So, for the moment, should we cache the default user inside the |
Hmm...I thought it was there already but you're right, it's not. The place for caching (the default) objects collection is here: It's quick to add, happy to do it because the performance implications could be great. If you get this PR through I can add the caching, you can track it here: #2216 I would avoid putting the caching in the |
Now if a new collection is created via .objects it is cached for the next time in a lazy store. Similarly the default user for a User collection is cached after being created for the first time. This may lead to issues if the default user can change for a given profile during runtime, however for now we don't support this.
|
||
# pylint: disable=wildcard-import | ||
|
||
from .profile import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need a wildcard here?
Perhaps just import the functions we want (I guess the constants defined in profile.py are not needed here (?))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this to what we do in other prats of the code (which I borrowed from the asyncio codebase):
from .profile import *
__all__ = profile.__all__
This way we delegate what gets imported to profile
itself.
aiida/common/profile.py
Outdated
@@ -126,6 +126,28 @@ def rmq_prefix(self): | |||
def is_test_profile(self): | |||
return self._test_profile | |||
|
|||
@property | |||
def configured_user_email(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to introduce a different adjective (configured user) here?
Why not default user?
Edit: I see you just moved the function. Perhaps still a good time to rename it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to default_user_email
email = self.config[DEFAULT_USER_CONFIG_FIELD] | ||
# I do not catch the error in case of missing configuration, because | ||
# it is already a ConfigurationError | ||
except KeyError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this an exception before as well?
Should it be forbidden not to have a default user?
Edit: I see, it was...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's forbidden. Can't store any nodes without one.
from aiida.backends.djsite.db.testbase import DjangoTests | ||
DjangoTests().clean_db() | ||
|
||
def __clean_db_sqla(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this stuff no longer necessary, this creation of a new_user object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's now done at the ORM level in the aiida/backends/testimplbase.py
insert_data
so it's backend independent...this is the way, ideally, all these things would be but we can't delete everything from ORM function calls.
@muhrin added a few comments, otherwise looks good to me (+ tested that node creation is fast again :-) )! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments as well
.ci/test_fixtures.py
Outdated
@@ -25,22 +25,24 @@ class FixtureManagerTestCase(unittest.TestCase): | |||
"""Test the FixtureManager class""" | |||
|
|||
def setUp(self): | |||
print("Starting set up") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe get rid of the print statements or make them debug/info logs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
.ci/test_fixtures.py
Outdated
@@ -55,6 +57,8 @@ def test_create_use_destroy_profile(self): | |||
from aiida import is_dbenv_loaded | |||
with Capturing() as output: | |||
self.fixture_manager.create_profile() | |||
|
|||
print("1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
left over debug print statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expunged
# user is recreated because it caches the user! | ||
# In any case, store it in self.user though | ||
# email = get_manager().get_profile().configured_user_email | ||
# self.test_session.query(DbUser).filter(DbUser.email != email).delete() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this comment still relevant if the accompanying line is commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted
aiida/orm/entities.py
Outdated
def __init__(self, backend, entity_class): | ||
"""Construct a new entity collection""" | ||
# assert issubclass(entity_class, Entity), "Must provide an entity type" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we uncomment this assert, or if not just remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncommented
aiida/orm/users.py
Outdated
try: | ||
return False, self.get(**kwargs) | ||
except exceptions.NotExistent: | ||
return True, User(backend=self.backend, **kwargs) | ||
|
||
def get_default(self): | ||
""" | ||
Get the current default user | ||
Get the current default user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong indentation in docstring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unindented
aiida/restapi/translator/computer.py
Outdated
# A label associated to the present class (coincides with the resource name) | ||
__label__ = "computers" | ||
# The AiiDA class one-to-one associated to the present class | ||
from aiida.orm import Computer | ||
_aiida_class = Computer | ||
# from aiida.orm import Computer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete commented import
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Erased
|
||
computer = Computer.get('deneb') | ||
dbauthinfo = computer.get_dbauthinfo(get_automatic_user()) | ||
dbauthinfo = computer.get_dbauthinfo(orm.User.objects.get_default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be done more succinctly now
computer = Computer.get('deneb')
transport = computer.get_transport()
scheduler = computer.get_scheduler()
scheduler.set_transport(transport)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced
This way we have greater control over its creation and we don't have to implement a reset method which starts to get complicate. We just throw the object away and instsantiate a new one when we need to reset * Removed construct_backend() (replaced by get_manager().get_backend()) * Changed all tests to not specify the backend, they should just do what user facing code does which is use the default one and if we ever want to test with different backends we just implement a function to load a new backend from a profile globally * Moved getting default user email to the `Profile` class
PR aiidateam#2214 stripped of safeguards that prevented the default user from being deleted in between tests. Since AiiDA expects a default user in many operations, leaving the DB in a state with no default user is not ideal. This commit reintroduces safeguards that result in the default user remaining untouched.
aiida.work
toaiida.manage
aiida.orm.entities.Collection
aiida.orm.user.User.Collection
(fix Add caching of object collections #2216)Benchmark on MacBook Pro, python3
Node.__init__
)Node.__init__
)The timing of 58.5s is significantly worse than what we wrote down during the tests with @szoupanos and @giovannipizzi (which was ~10s for 10k nodes), i.e. probably someone broke the caching.
Anyhow, the target we set then was: 2s for 10k nodes, which we have reached now on python3 (it's the 1.7s that should scale linearly with the number of nodes).