Adding nodes to group optimisation using SQLA code to generate SQL INSERT #2518

szoupanos · 2019-02-22T18:50:29Z

This is an implementation for issue #1319 using SQLA code to generate the SQL INSERT statement.

It would be nice to test its performance too (@ltalirz if you can easily provide your 5800 parameter data export file at 0.4 version, it would help a lot)

coveralls · 2019-02-22T20:10:52Z

Coverage increased (+0.04%) to 70.127% when pulling ccfbcb2 on szoupanos:issue_1319_for_provenance_redesign into da6bb4d on aiidateam:provenance_redesign.

ltalirz · 2019-02-23T18:11:50Z

Here you go: I put 10k nodes and I disabled the warning

Critical: the export functionality is currently disabled until issue #2342 is addressed

to be able to export this (on commit f8c0191).

pmd_10000.zip

P.S. It seems to me that on provenance_redesign querying for the 10k nodes is very slow (django backend)... was a bit unexpected for me

from aiida.plugins import DataFactory
pd=DataFactory('parameter')

from aiida.orm import QueryBuilder
qb=QueryBuilder()
qb.append(pd)
res = qb.all()

szoupanos · 2019-02-27T11:06:15Z

Did you also test the performance of the import in both backends (and group addition)?
If not, I will test it (I am mainly asking to see where we stand on this & finish the merging).

ltalirz · 2019-02-27T12:28:59Z

I didn't test import speed on provenance_redesign

szoupanos · 2019-02-28T11:10:12Z

Things are better than I expected regarding speed (I imported to both backend the provided import file by Leopold)

Django

real	2m29.966s
user	1m45.807s
sys	0m9.154s

SQLA

real	3m33.812s
user	2m32.658s
sys	0m12.951s

(aiidapy2) aiida@ubuntu_vm1_new:~/spyros_test$ verdi -p issue_1759_dj group list -A -C
Info: If you want to see the groups of all types, please add -a/--all-types option
  PK  Label    Type string    User                      Node count
----  -------  -------------  ----------------------  ------------
   1  pmd      user           leopold.talirz@epfl.ch         10000
(aiidapy2) aiida@ubuntu_vm1_new:~/spyros_test$ verdi -p issue_1759_sqla group list -A -C
Info: If you want to see the groups of all types, please add -a/--all-types option
  PK  Label    Type string    User                      Node count
----  -------  -------------  ----------------------  ------------
   1  pmd      user           leopold.talirz@epfl.ch         10000

szoupanos · 2019-02-28T11:12:27Z

If you don't have any more comments, please merge it.

ltalirz · 2019-02-28T11:25:42Z

Ok, so to summarize:
going by the numbers given for the raw sql implementation for 5800 nodes, this is about a factor of 6 slower for django and a factor of 5 slower for sqla.
Something we can live with for the moment and certainly a great improvement over the status quo.

I'd be happy to merge this but I think we should get the OK from either @sphuber or @giovannipizzi as well.

aiida/orm/implementation/sqlalchemy/groups.py

szoupanos · 2019-02-28T11:54:35Z

@ltalirz That's a good point... When I mentioned that the speed is better than what I expected, I looked only at the import speed of Django vs SQLA at the current implementation & branch. And indeed, here SQLA import is just 50% slower than Django import (when I expected Django to be 4 times faster than SQLA given that this group_addition optimisation is slower than the raw SQL optimization that we did at #2471).

I didn't compare the numbers with the 0.12 implementation (#2471) - thanks for doing this - and I am surprised a bit. My expectation is Django to have similar speed to #2471 and SQLA to be 2 times slower than the SQLA implementation of #2471

…des the option to bypass the SQLA ORM and speed up the addition of nodes to a group. This option is only available at the Backend and therefore only to users who need this speedup (the ORM level users should not be aware of such options). Import SQLA was modified to use this optimisation.

ltalirz · 2019-03-01T18:10:15Z

@szoupanos From your last message it is not entirely clear whether you still intend to work on this. Could you perhaps let us know?

szoupanos · 2019-03-02T14:57:20Z

I would say no for the moment since I understand that the current overall performance is still acceptable for you (and the performance of the optimized SQLA import is much better than non-optimized SQLA import that we had).

I think that this is OK for 1beta and let's assess our goals for 1.0 given the limited time that we have (e.g. I think that it is worth the effort to try to implement Django JSONB than continuing to investigate this).

…esign

szoupanos requested review from ltalirz and sphuber February 22, 2019 18:50

szoupanos force-pushed the issue_1319_for_provenance_redesign branch from d16af0f to 482ea6a Compare February 22, 2019 18:51

szoupanos force-pushed the issue_1319_for_provenance_redesign branch from 482ea6a to 5340674 Compare February 28, 2019 10:50

sphuber reviewed Feb 28, 2019

View reviewed changes

aiida/orm/implementation/sqlalchemy/groups.py Outdated Show resolved Hide resolved

szoupanos force-pushed the issue_1319_for_provenance_redesign branch from 5340674 to c386e06 Compare February 28, 2019 12:24

Merge branch 'provenance_redesign' into issue_1319_for_provenance_red…

ccfbcb2

…esign

sphuber self-requested a review March 2, 2019 16:10

sphuber approved these changes Mar 2, 2019

View reviewed changes

sphuber merged commit b40b2ca into aiidateam:provenance_redesign Mar 2, 2019

sphuber mentioned this pull request Mar 4, 2019

Adding many nodes (~ 10000 ) to a group is extremely slow (sqlalchemy backend) #1319

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding nodes to group optimisation using SQLA code to generate SQL INSERT #2518

Adding nodes to group optimisation using SQLA code to generate SQL INSERT #2518

szoupanos commented Feb 22, 2019

coveralls commented Feb 22, 2019 •

edited

Loading

ltalirz commented Feb 23, 2019

szoupanos commented Feb 27, 2019

ltalirz commented Feb 27, 2019

szoupanos commented Feb 28, 2019

szoupanos commented Feb 28, 2019

ltalirz commented Feb 28, 2019

szoupanos commented Feb 28, 2019

ltalirz commented Mar 1, 2019

szoupanos commented Mar 2, 2019

Adding nodes to group optimisation using SQLA code to generate SQL INSERT #2518

Adding nodes to group optimisation using SQLA code to generate SQL INSERT #2518

Conversation

szoupanos commented Feb 22, 2019

coveralls commented Feb 22, 2019 • edited Loading

ltalirz commented Feb 23, 2019

szoupanos commented Feb 27, 2019

ltalirz commented Feb 27, 2019

szoupanos commented Feb 28, 2019

szoupanos commented Feb 28, 2019

ltalirz commented Feb 28, 2019

szoupanos commented Feb 28, 2019

ltalirz commented Mar 1, 2019

szoupanos commented Mar 2, 2019

coveralls commented Feb 22, 2019 •

edited

Loading