Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding nodes to group optimisation using SQLA code to generate SQL INSERT #2518

Conversation

szoupanos
Copy link
Contributor

This is an implementation for issue #1319 using SQLA code to generate the SQL INSERT statement.

It would be nice to test its performance too (@ltalirz if you can easily provide your 5800 parameter data export file at 0.4 version, it would help a lot)

@szoupanos szoupanos requested review from ltalirz and sphuber February 22, 2019 18:50
@szoupanos szoupanos force-pushed the issue_1319_for_provenance_redesign branch from d16af0f to 482ea6a Compare February 22, 2019 18:51
@coveralls
Copy link

coveralls commented Feb 22, 2019

Coverage Status

Coverage increased (+0.04%) to 70.127% when pulling ccfbcb2 on szoupanos:issue_1319_for_provenance_redesign into da6bb4d on aiidateam:provenance_redesign.

@ltalirz
Copy link
Member

ltalirz commented Feb 23, 2019

Here you go: I put 10k nodes and I disabled the warning

Critical: the export functionality is currently disabled until issue #2342 is addressed

to be able to export this (on commit f8c0191).

pmd_10000.zip

P.S. It seems to me that on provenance_redesign querying for the 10k nodes is very slow (django backend)... was a bit unexpected for me

from aiida.plugins import DataFactory
pd=DataFactory('parameter')

from aiida.orm import QueryBuilder
qb=QueryBuilder()
qb.append(pd)
res = qb.all()

@szoupanos
Copy link
Contributor Author

Did you also test the performance of the import in both backends (and group addition)?
If not, I will test it (I am mainly asking to see where we stand on this & finish the merging).

@ltalirz
Copy link
Member

ltalirz commented Feb 27, 2019

I didn't test import speed on provenance_redesign

@szoupanos szoupanos force-pushed the issue_1319_for_provenance_redesign branch from 482ea6a to 5340674 Compare February 28, 2019 10:50
@szoupanos
Copy link
Contributor Author

Things are better than I expected regarding speed (I imported to both backend the provided import file by Leopold)

Django

real	2m29.966s
user	1m45.807s
sys	0m9.154s

SQLA

real	3m33.812s
user	2m32.658s
sys	0m12.951s
(aiidapy2) aiida@ubuntu_vm1_new:~/spyros_test$ verdi -p issue_1759_dj group list -A -C
Info: If you want to see the groups of all types, please add -a/--all-types option
  PK  Label    Type string    User                      Node count
----  -------  -------------  ----------------------  ------------
   1  pmd      user           leopold.talirz@epfl.ch         10000
(aiidapy2) aiida@ubuntu_vm1_new:~/spyros_test$ verdi -p issue_1759_sqla group list -A -C
Info: If you want to see the groups of all types, please add -a/--all-types option
  PK  Label    Type string    User                      Node count
----  -------  -------------  ----------------------  ------------
   1  pmd      user           leopold.talirz@epfl.ch         10000

@szoupanos
Copy link
Contributor Author

If you don't have any more comments, please merge it.

@ltalirz
Copy link
Member

ltalirz commented Feb 28, 2019

Ok, so to summarize:
going by the numbers given for the raw sql implementation for 5800 nodes, this is about a factor of 6 slower for django and a factor of 5 slower for sqla.
Something we can live with for the moment and certainly a great improvement over the status quo.

I'd be happy to merge this but I think we should get the OK from either @sphuber or @giovannipizzi as well.

@szoupanos
Copy link
Contributor Author

@ltalirz That's a good point... When I mentioned that the speed is better than what I expected, I looked only at the import speed of Django vs SQLA at the current implementation & branch. And indeed, here SQLA import is just 50% slower than Django import (when I expected Django to be 4 times faster than SQLA given that this group_addition optimisation is slower than the raw SQL optimization that we did at #2471).

I didn't compare the numbers with the 0.12 implementation (#2471) - thanks for doing this - and I am surprised a bit. My expectation is Django to have similar speed to #2471 and SQLA to be 2 times slower than the SQLA implementation of #2471

…des the option to bypass the SQLA

ORM and speed up the addition of nodes to a group.
This option is only available at the Backend and therefore only to users who need this speedup (the ORM level
users should not be aware of such options).
Import SQLA was modified to use this optimisation.
@szoupanos szoupanos force-pushed the issue_1319_for_provenance_redesign branch from 5340674 to c386e06 Compare February 28, 2019 12:24
@ltalirz
Copy link
Member

ltalirz commented Mar 1, 2019

@szoupanos From your last message it is not entirely clear whether you still intend to work on this. Could you perhaps let us know?

@szoupanos
Copy link
Contributor Author

I would say no for the moment since I understand that the current overall performance is still acceptable for you (and the performance of the optimized SQLA import is much better than non-optimized SQLA import that we had).

I think that this is OK for 1beta and let's assess our goals for 1.0 given the limited time that we have (e.g. I think that it is worth the effort to try to implement Django JSONB than continuing to investigate this).

@sphuber sphuber self-requested a review March 2, 2019 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants