Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding many nodes (~ 10000 ) to a group is extremely slow (sqlalchemy backend) #1319

Closed
yakutovicha opened this issue Mar 22, 2018 · 16 comments

Comments

@yakutovicha
Copy link
Contributor

yakutovicha commented Mar 22, 2018

Trying to append many nodes to a group I noticed that it takes extremely large amount of time.

What I am doing is the following:

In[1] n = QueryBuilder()

In [2]: n.append(Group) 
Out[2]: <aiida.orm.querybuilder.QueryBuilder at 0x7f0cb956a710>

In [3]: g = n.all()[1][0]
In [4]: g
Out[4]: <Group: "test" [user-defined], of user some.body@xyz.com>

In [5]: n = QueryBuilder()

In [6]: from aiida.orm.data.cif import CifData

In [7]: n.append(CifData)
Out[7]: <aiida.orm.querybuilder.QueryBuilder at 0x7f0cb8c2efd0>

In [8]: Group
Out[8]: aiida.orm.implementation.sqlalchemy.group.Group

In [9]: flat_list = [item for sublist in n.all() for item in sublist]
In [10]: flat_list[0]
Out[10]: <CifData: uuid: dd0f8540-00c2-42d2-8e58-8108eeb7a59e (pk: 1)>

In [11]: g.add_nodes(flat_list)    <--- Takes more then 30 minutes 

The reason I am doing that is because I want to export the whole database using the following command:
verdi export create --group_names test2 test2.export

To reproduce the error I attach a script to append 10000 CifData nodes into AiiDA database.
upload_10000_cifs.tar.gz

@szoupanos
Copy link
Contributor

I will try to look at it and if not @sphuber or @lekah can also have a look at it

@szoupanos
Copy link
Contributor

Yes, it looks slower than Django (order of magnitude)

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 2356
Adding nodes one by one
Counter: 100, Time elspsed in ms: 2.44834113121, Throughput (time per 100 nodes): 2.44834113121
Counter: 200, Time elspsed in ms: 5.07201814651, Throughput (time per 100 nodes): 2.6236770153
Counter: 300, Time elspsed in ms: 8.32266497612, Throughput (time per 100 nodes): 3.25064682961
Counter: 400, Time elspsed in ms: 12.194149971, Throughput (time per 100 nodes): 3.87148499489
Counter: 500, Time elspsed in ms: 16.7329201698, Throughput (time per 100 nodes): 4.53877019882
Counter: 600, Time elspsed in ms: 21.8637630939, Throughput (time per 100 nodes): 5.13084292412
Counter: 700, Time elspsed in ms: 27.663932085, Throughput (time per 100 nodes): 5.80016899109
Counter: 800, Time elspsed in ms: 34.2302181721, Throughput (time per 100 nodes): 6.56628608704
Counter: 900, Time elspsed in ms: 41.9461250305, Throughput (time per 100 nodes): 7.71590685844
Counter: 1000, Time elspsed in ms: 49.69445014, Throughput (time per 100 nodes): 7.74832510948
Time elapsed in ms 49.6986660957
Deleting the created group
Creating a temp group
Adding nodes in batches of 100
Counter: 1, Time elspsed in ms: 1.135

54191589, Throughput (time per 100 nodes): 1.13554191589, chunk size: 100
Counter: 2, Time elspsed in ms: 2.53165006638, Throughput (time per 100 nodes): 1.39610815048, chunk size: 100
Counter: 3, Time elspsed in ms: 4.49646306038, Throughput (time per 100 nodes): 1.964812994, chunk size: 100
Counter: 4, Time elspsed in ms: 6.94511389732, Throughput (time per 100 nodes): 2.44865083694, chunk size: 100
Counter: 5, Time elspsed in ms: 9.95142793655, Throughput (time per 100 nodes): 3.00631403923, chunk size: 100
Counter: 6, Time elspsed in ms: 13.4705839157, Throughput (time per 100 nodes): 3.51915597916, chunk size: 100
Counter: 7, Time elspsed in ms: 17.5556740761, Throughput (time per 100 nodes): 4.08509016037, chunk size: 100
Counter: 8, Time elspsed in ms: 22.1915791035, Throughput (time per 100 nodes): 4.63590502739, chunk size: 100
Counter: 9, Time elspsed in ms: 27.4612090588, Throughput (time per 100 nodes): 5.26962995529, chunk size: 100
Counter: 10, Time elspsed in ms: 35.200799942, Throughput (time per 100 nodes): 7.73959088326, chunk size: 100
Time elapsed in ms 35.2008600235
(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p dj_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 3973
Adding nodes one by one
Counter: 100, Time elspsed in ms: 0.949183940887, Throughput (time per 100 nodes): 0.949183940887
Counter: 200, Time elspsed in ms: 1.34713888168, Throughput (time per 100 nodes): 0.397954940796
Counter: 300, Time elspsed in ms: 1.73349189758, Throughput (time per 100 nodes): 0.3863530159
Counter: 400, Time elspsed in ms: 2.11793279648, Throughput (time per 100 nodes): 0.384440898895
Counter: 500, Time elspsed in ms: 2.53088498116, Throughput (time per 100 nodes): 0.412952184677
Counter: 600, Time elspsed in ms: 2.91644692421, Throughput (time per 100 nodes): 0.385561943054
Counter: 700, Time elspsed in ms: 3.30022978783, Throughput (time per 100 nodes): 0.383782863617
Counter: 800, Time elspsed in ms: 3.70153880119, Throughput (time per 100 nodes): 0.401309013367
Counter: 900, Time elspsed in ms: 4.11438179016, Throughput (time per 100 nodes): 0.412842988968
Counter: 1000, Time elspsed in ms: 4.51084685326, Throughput (time per 100 nodes): 0.396465063095
Time elapsed in ms 4.5131418705
Deleting the created group
Creating a temp group
Adding nodes in batches of 100
Counter: 1, Time elspsed in ms: 0.594545125961, Throughput (time per 100 nodes): 0.594545125961, chunk size: 100
Counter: 2, Time elspsed in ms: 0.60437297821, Throughput (time per 100 nodes): 0.00982785224915, chunk size: 100
Counter: 3, Time elspsed in ms: 0.614166975021, Throughput (time per 100 nodes): 0.00979399681091, chunk size: 100
Counter: 4, Time elspsed in ms: 0.623984098434, Throughput (time per 100 nodes): 0.00981712341309, chunk size: 100
Counter: 5, Time elspsed in ms: 0.636304140091, Throughput (time per 100 nodes): 0.0123200416565, chunk size: 100
Counter: 6, Time elspsed in ms: 0.65016412735, Throughput (time per 100 nodes): 0.0138599872589, chunk size: 100
Counter: 7, Time elspsed in ms: 0.661014080048, Throughput (time per 100 nodes): 0.0108499526978, chunk size: 100
Counter: 8, Time elspsed in ms: 0.677121162415, Throughput (time per 100 nodes): 0.0161070823669, chunk size: 100
Counter: 9, Time elspsed in ms: 0.687810182571, Throughput (time per 100 nodes): 0.0106890201569, chunk size: 100
Counter: 10, Time elspsed in ms: 0.698170185089, Throughput (time per 100 nodes): 0.0103600025177, chunk size: 100
Time elapsed in ms 0.698220014572

I tested it with the following script
test_script.py.txt

@yakutovicha
Copy link
Contributor Author

I wonder why the time per 100 nodes increases for sqlalchemy, but stays the same for django? May it be related to the issue #1289 ?

@szoupanos
Copy link
Contributor

szoupanos commented Mar 23, 2018

With a bit better management of the session we get the following

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 2009
Adding nodes one by one
Counter: 100, Time elspsed in ms: 1.01982784271, Throughput (time per 100 nodes): 1.01982784271
Counter: 200, Time elspsed in ms: 1.72021579742, Throughput (time per 100 nodes): 0.700387954712
Counter: 300, Time elspsed in ms: 2.42927479744, Throughput (time per 100 nodes): 0.709059000015
Counter: 400, Time elspsed in ms: 3.12652492523, Throughput (time per 100 nodes): 0.697250127792
Counter: 500, Time elspsed in ms: 3.80724000931, Throughput (time per 100 nodes): 0.680715084076
Counter: 600, Time elspsed in ms: 4.50745391846, Throughput (time per 100 nodes): 0.700213909149
Counter: 700, Time elspsed in ms: 5.22011780739, Throughput (time per 100 nodes): 0.712663888931
Counter: 800, Time elspsed in ms: 5.89732599258, Throughput (time per 100 nodes): 0.677208185196
Counter: 900, Time elspsed in ms: 6.50354790688, Throughput (time per 100 nodes): 0.606221914291
Counter: 1000, Time elspsed in ms: 7.12604689598, Throughput (time per 100 nodes): 0.622498989105
Time elapsed in ms 7.13044190407
Size of the group 1000
Deleting the created group
Creating a temp group
Number of available Cif data 2009
Adding nodes in batches of 100
Counter: 1, Time elspsed in ms: 1.07292699814, Throughput (time per 100 nodes): 1.07292699814, chunk size: 100
Counter: 2, Time elspsed in ms: 2.34552097321, Throughput (time per 100 nodes): 1.27259397507, chunk size: 100
Counter: 3, Time elspsed in ms: 4.1684179306, Throughput (time per 100 nodes): 1.8228969574, chunk size: 100
Counter: 4, Time elspsed in ms: 6.62577581406, Throughput (time per 100 nodes): 2.45735788345, chunk size: 100
Counter: 5, Time elspsed in ms: 9.93918085098, Throughput (time per 100 nodes): 3.31340503693, chunk size: 100
Counter: 6, Time elspsed in ms: 14.1814668179, Throughput (time per 100 nodes): 4.24228596687, chunk size: 100
Counter: 7, Time elspsed in ms: 19.2133069038, Throughput (time per 100 nodes): 5.03184008598, chunk size: 100
Counter: 8, Time elspsed in ms: 24.7534418106, Throughput (time per 100 nodes): 5.54013490677, chunk size: 100
Counter: 9, Time elspsed in ms: 30.6617147923, Throughput (time per 100 nodes): 5.90827298164, chunk size: 100
Counter: 10, Time elspsed in ms: 39.5465018749, Throughput (time per 100 nodes): 8.88478708267, chunk size: 100
Time elapsed in ms 39.5472309589
Size of the group 1000

e.g.
Look at the session.expunge_all() etc
aiida.orm.implementation.sqlalchemy.group.Group#add_nodes

        session = get_scoped_session()
        session.expunge_all()
        # First convert to a list
        if isinstance(nodes, (Node, DbNode)):
            nodes = [nodes]

        if isinstance(nodes, basestring) or not isinstance(
                nodes, collections.Iterable):
            raise TypeError("Invalid type passed as the 'nodes' parameter to "
                            "add_nodes, can only be a Node, DbNode, or a list "
                            "of such objects, it is instead {}".format(
                str(type(nodes))))

        list_nodes = []
        # session.add(self._dbgroup)
        for node in nodes:
            if not isinstance(node, (Node, DbNode)):
                raise TypeError("Invalid type of one of the elements passed "
                                "to add_nodes, it should be either a Node or "
                                "a DbNode, it is instead {}".format(
                    str(type(node))))

            if node.id is None:
                raise ValueError("At least one of the provided nodes is "
                                 "unstored, stopping...")
            if isinstance(node, Node):
                to_add = node.dbnode
            else:
                to_add = node

            if to_add not in self._dbgroup.dbnodes:
                # ~ list_nodes.append(to_add)
                self._dbgroup.dbnodes.append(to_add)
                session.add(self._dbgroup)
                session.add(to_add)
        session.commit()

But it needs a bit of thinking.

If Leo/Seb don't find a solution I will have a look at it as soon as I come back

@szoupanos
Copy link
Contributor

@yakutovicha
I think so. I have the impression that both of them are related to the session that we never clean and we keep adding objects (which should be kept in sync with the DB - I suppose) that we commit from time to time. But it needs a better look before we jump fast to conclusions.

@sphuber sphuber added this to the v0.12.0 milestone Apr 4, 2018
@sphuber
Copy link
Contributor

sphuber commented Apr 4, 2018

One source of the problem might be missing indexes in the SqlAlchemy schema:

aiida_sdb=# \d+ db_dbgroup_dbnodes
                                               Table "public.db_dbgroup_dbnodes"
   Column   |  Type   |                            Modifiers                            | Storage | Stats target | Description 
------------+---------+-----------------------------------------------------------------+---------+--------------+-------------
 id         | integer | not null default nextval('db_dbgroup_dbnodes_id_seq'::regclass) | plain   |              | 
 dbnode_id  | integer |                                                                 | plain   |              | 
 dbgroup_id | integer |                                                                 | plain   |              | 
Indexes:
    "db_dbgroup_dbnodes_pkey" PRIMARY KEY, btree (id)
Foreign-key constraints:
    "db_dbgroup_dbnodes_dbgroup_id_fkey" FOREIGN KEY (dbgroup_id) REFERENCES db_dbgroup(id) DEFERRABLE INITIALLY DEFERRED
    "db_dbgroup_dbnodes_dbnode_id_fkey" FOREIGN KEY (dbnode_id) REFERENCES db_dbnode(id) DEFERRABLE INITIALLY DEFERRED

Notice the difference with the django schema:

                                               Table "public.db_dbgroup_dbnodes"
   Column   |  Type   |                            Modifiers                            | Storage | Stats target | Description 
------------+---------+-----------------------------------------------------------------+---------+--------------+-------------
 id         | integer | not null default nextval('db_dbgroup_dbnodes_id_seq'::regclass) | plain   |              | 
 dbgroup_id | integer | not null                                                        | plain   |              | 
 dbnode_id  | integer | not null                                                        | plain   |              | 
Indexes:
    "db_dbgroup_dbnodes_pkey" PRIMARY KEY, btree (id)
    "db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key" UNIQUE CONSTRAINT, btree (dbgroup_id, dbnode_id)
    "db_dbgroup_dbnodes_7a672316" btree (dbnode_id)
    "db_dbgroup_dbnodes_a0b4eda0" btree (dbgroup_id)
Foreign-key constraints:
    "db_dbgroup_dbnodes_dbgroup_id_32d69f1acbc4c03c_fk_db_dbgroup_id" FOREIGN KEY (dbgroup_id) REFERENCES db_dbgroup(id) DEFERRABLE INITIALLY DEFERRED
    "db_dbgroup_dbnodes_dbnode_id_53a1829a1973b99c_fk_db_dbnode_id" FOREIGN KEY (dbnode_id) REFERENCES db_dbnode(id) DEFERRABLE INITIALLY DEFERRED

Note that the django schema has a UNIQUE CONSTRAINT on the node-group pair, which is missing for SqlAlchemy, which we should add. Additionally, the Django schema has indices on all three columns whereas SqlAlchemy only has one on the primary key.

@szoupanos
Copy link
Contributor

I added the indexes and I don't see much different in time.
The containment test seems to be the most costly operation and maybe we can find a way to do it more efficiently (by checking the containment of a group of nodes in a query?).

----------------------------------------------------------------------------------------
Original code
----------------------------------------------------------------------------------------
(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 3.94729804993, Throughput (time per 100 nodes): 3.94729804993
Counter: 200, Time elapsed in ms: 8.0202190876, Throughput (time per 100 nodes): 4.07292103767
Counter: 300, Time elapsed in ms: 12.7691149712, Throughput (time per 100 nodes): 4.74889588356
Counter: 400, Time elapsed in ms: 18.1638829708, Throughput (time per 100 nodes): 5.39476799965
Counter: 500, Time elapsed in ms: 24.1637990475, Throughput (time per 100 nodes): 5.99991607666
Counter: 600, Time elapsed in ms: 30.907553196, Throughput (time per 100 nodes): 6.74375414848
Counter: 700, Time elapsed in ms: 38.2967870235, Throughput (time per 100 nodes): 7.38923382759
Counter: 800, Time elapsed in ms: 46.3559901714, Throughput (time per 100 nodes): 8.05920314789
Counter: 900, Time elapsed in ms: 55.335242033, Throughput (time per 100 nodes): 8.97925186157
Counter: 1000, Time elapsed in ms: 66.8341851234, Throughput (time per 100 nodes): 11.4989430904
Time elapsed in ms 66.8365550041
Deleting the created group
Creating a temp group
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 1.17923212051, Throughput (time per 100 nodes): 1.17923212051, chunk size: 100
Counter: 2, Time elapsed in ms: 2.78667020798, Throughput (time per 100 nodes): 1.60743808746, chunk size: 100
Counter: 3, Time elapsed in ms: 5.02131104469, Throughput (time per 100 nodes): 2.23464083672, chunk size: 100
Counter: 4, Time elapsed in ms: 7.82664012909, Throughput (time per 100 nodes): 2.8053290844, chunk size: 100
Counter: 5, Time elapsed in ms: 11.1074562073, Throughput (time per 100 nodes): 3.28081607819, chunk size: 100
Counter: 6, Time elapsed in ms: 15.0134861469, Throughput (time per 100 nodes): 3.90602993965, chunk size: 100
Counter: 7, Time elapsed in ms: 19.3837821484, Throughput (time per 100 nodes): 4.37029600143, chunk size: 100
Counter: 8, Time elapsed in ms: 24.4524521828, Throughput (time per 100 nodes): 5.06867003441, chunk size: 100
Counter: 9, Time elapsed in ms: 30.1628420353, Throughput (time per 100 nodes): 5.71038985252, chunk size: 100
Counter: 10, Time elapsed in ms: 36.2901990414, Throughput (time per 100 nodes): 6.12735700607, chunk size: 100
Time elapsed in ms 36.2902669907


----------------------------------------------------------------------------------------
Without containment test
----------------------------------------------------------------------------------------

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 3.41510105133, Throughput (time per 100 nodes): 3.41510105133
Counter: 200, Time elapsed in ms: 6.32242798805, Throughput (time per 100 nodes): 2.90732693672
Counter: 300, Time elapsed in ms: 9.21573114395, Throughput (time per 100 nodes): 2.8933031559
Counter: 400, Time elapsed in ms: 12.0991930962, Throughput (time per 100 nodes): 2.88346195221
Counter: 500, Time elapsed in ms: 15.329128027, Throughput (time per 100 nodes): 3.2299349308
Counter: 600, Time elapsed in ms: 18.2226450443, Throughput (time per 100 nodes): 2.89351701736
Counter: 700, Time elapsed in ms: 21.1963210106, Throughput (time per 100 nodes): 2.97367596626
Counter: 800, Time elapsed in ms: 24.208920002, Throughput (time per 100 nodes): 3.01259899139
Counter: 900, Time elapsed in ms: 27.3985540867, Throughput (time per 100 nodes): 3.1896340847
Counter: 1000, Time elapsed in ms: 30.3508961201, Throughput (time per 100 nodes): 2.95234203339
Time elapsed in ms 30.3536851406
Deleting the created group
Creating a temp group
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.271717071533, Throughput (time per 100 nodes): 0.271717071533, chunk size: 100
Counter: 2, Time elapsed in ms: 0.539061069489, Throughput (time per 100 nodes): 0.267343997955, chunk size: 100
Counter: 3, Time elapsed in ms: 0.809602022171, Throughput (time per 100 nodes): 0.270540952682, chunk size: 100
Counter: 4, Time elapsed in ms: 1.08010387421, Throughput (time per 100 nodes): 0.270501852036, chunk size: 100
Counter: 5, Time elapsed in ms: 1.36695885658, Throughput (time per 100 nodes): 0.286854982376, chunk size: 100
Counter: 6, Time elapsed in ms: 1.63579297066, Throughput (time per 100 nodes): 0.268834114075, chunk size: 100
Counter: 7, Time elapsed in ms: 1.90855193138, Throughput (time per 100 nodes): 0.272758960724, chunk size: 100
Counter: 8, Time elapsed in ms: 2.17859196663, Throughput (time per 100 nodes): 0.270040035248, chunk size: 100
Counter: 9, Time elapsed in ms: 2.45090293884, Throughput (time per 100 nodes): 0.272310972214, chunk size: 100
Counter: 10, Time elapsed in ms: 2.71773600578, Throughput (time per 100 nodes): 0.26683306694, chunk size: 100
Time elapsed in ms 2.71839499474


----------------------------------------------------------------------------------------
Without containment test and with expunge_all()

---
session.expunge_all()

        session.add(self._dbgroup)
        for node in nodes:
            if not isinstance(node, (Node, DbNode)):
                raise TypeError("Invalid type of one of the elements passed "
                                "to add_nodes, it should be either a Node or "
                                "a DbNode, it is instead {}".format(
                    str(type(node))))

            if node.id is None:
                raise ValueError("At least one of the provided nodes is "
                                 "unstored, stopping...")
            if isinstance(node, Node):
                to_add = node.dbnode
            else:
                to_add = node

            # if to_add not in self._dbgroup.dbnodes:
            #     # ~ list_nodes.append(to_add)
            #     self._dbgroup.dbnodes.append(to_add)

            self._dbgroup.dbnodes.append(to_add)
            session.add(to_add)

        session.commit()
----------------------------------------------------------------------------------------

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 2.18053793907, Throughput (time per 100 nodes): 2.18053793907
Counter: 200, Time elapsed in ms: 3.00454902649, Throughput (time per 100 nodes): 0.824011087418
Counter: 300, Time elapsed in ms: 3.62919211388, Throughput (time per 100 nodes): 0.624643087387
Counter: 400, Time elapsed in ms: 4.26630592346, Throughput (time per 100 nodes): 0.637113809586
Counter: 500, Time elapsed in ms: 4.85422706604, Throughput (time per 100 nodes): 0.587921142578
Counter: 600, Time elapsed in ms: 5.44937396049, Throughput (time per 100 nodes): 0.595146894455
Counter: 700, Time elapsed in ms: 6.11360406876, Throughput (time per 100 nodes): 0.664230108261
Counter: 800, Time elapsed in ms: 6.77587509155, Throughput (time per 100 nodes): 0.662271022797
Counter: 900, Time elapsed in ms: 7.36756801605, Throughput (time per 100 nodes): 0.5916929245
Counter: 1000, Time elapsed in ms: 7.96999406815, Throughput (time per 100 nodes): 0.602426052094
Time elapsed in ms 7.97175598145
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.0923318862915, Throughput (time per 100 nodes): 0.0923318862915, chunk size: 100
Counter: 2, Time elapsed in ms: 0.186807870865, Throughput (time per 100 nodes): 0.0944759845734, chunk size: 100
Counter: 3, Time elapsed in ms: 0.272273778915, Throughput (time per 100 nodes): 0.0854659080505, chunk size: 100
Counter: 4, Time elapsed in ms: 0.362018823624, Throughput (time per 100 nodes): 0.0897450447083, chunk size: 100
Counter: 5, Time elapsed in ms: 0.504228830338, Throughput (time per 100 nodes): 0.142210006714, chunk size: 100
Counter: 6, Time elapsed in ms: 0.588252782822, Throughput (time per 100 nodes): 0.0840239524841, chunk size: 100
Counter: 7, Time elapsed in ms: 0.672455787659, Throughput (time per 100 nodes): 0.084203004837, chunk size: 100
Counter: 8, Time elapsed in ms: 0.756693840027, Throughput (time per 100 nodes): 0.0842380523682, chunk size: 100
Counter: 9, Time elapsed in ms: 0.847168922424, Throughput (time per 100 nodes): 0.0904750823975, chunk size: 100
Counter: 10, Time elapsed in ms: 0.93253993988, Throughput (time per 100 nodes): 0.0853710174561, chunk size: 100
Time elapsed in ms 0.933213949203


----------------------------------------------------------------------------------------
With expunge_all()
And containment test without indexes

---
        session.add(self._dbgroup)
        for node in nodes:
            if not isinstance(node, (Node, DbNode)):
                raise TypeError("Invalid type of one of the elements passed "
                                "to add_nodes, it should be either a Node or "
                                "a DbNode, it is instead {}".format(
                    str(type(node))))

            if node.id is None:
                raise ValueError("At least one of the provided nodes is "
                                 "unstored, stopping...")
            if isinstance(node, Node):
                to_add = node.dbnode
            else:
                to_add = node

            if to_add not in self._dbgroup.dbnodes:
                # ~ list_nodes.append(to_add)
                self._dbgroup.dbnodes.append(to_add)

            # self._dbgroup.dbnodes.append(to_add)
            session.add(to_add)
---

drop index db_dbgroup_dbnodes_7a672316;
drop index db_dbgroup_dbnodes_a0b4eda0;
drop index db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key;
----------------------------------------------------------------------------------------


(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 1.66245794296, Throughput (time per 100 nodes): 1.66245794296
Counter: 200, Time elapsed in ms: 3.41930198669, Throughput (time per 100 nodes): 1.75684404373
Counter: 300, Time elapsed in ms: 5.82579803467, Throughput (time per 100 nodes): 2.40649604797
Counter: 400, Time elapsed in ms: 8.93039608002, Throughput (time per 100 nodes): 3.10459804535
Counter: 500, Time elapsed in ms: 12.642113924, Throughput (time per 100 nodes): 3.71171784401
Counter: 600, Time elapsed in ms: 17.5051529408, Throughput (time per 100 nodes): 4.86303901672
Counter: 700, Time elapsed in ms: 23.0371758938, Throughput (time per 100 nodes): 5.53202295303
Counter: 800, Time elapsed in ms: 29.9214580059, Throughput (time per 100 nodes): 6.88428211212
Counter: 900, Time elapsed in ms: 36.4096589088, Throughput (time per 100 nodes): 6.48820090294
Counter: 1000, Time elapsed in ms: 43.2980780602, Throughput (time per 100 nodes): 6.88841915131
Time elapsed in ms 43.3001420498
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.81974697113, Throughput (time per 100 nodes): 0.81974697113, chunk size: 100
Counter: 2, Time elapsed in ms: 2.16685199738, Throughput (time per 100 nodes): 1.34710502625, chunk size: 100
Counter: 3, Time elapsed in ms: 4.12957000732, Throughput (time per 100 nodes): 1.96271800995, chunk size: 100
Counter: 4, Time elapsed in ms: 6.52032613754, Throughput (time per 100 nodes): 2.39075613022, chunk size: 100
Counter: 5, Time elapsed in ms: 9.75746107101, Throughput (time per 100 nodes): 3.23713493347, chunk size: 100
Counter: 6, Time elapsed in ms: 13.5617980957, Throughput (time per 100 nodes): 3.80433702469, chunk size: 100
Counter: 7, Time elapsed in ms: 18.0056431293, Throughput (time per 100 nodes): 4.44384503365, chunk size: 100
Counter: 8, Time elapsed in ms: 23.15102005, Throughput (time per 100 nodes): 5.1453769207, chunk size: 100
Counter: 9, Time elapsed in ms: 29.3757650852, Throughput (time per 100 nodes): 6.22474503517, chunk size: 100
Counter: 10, Time elapsed in ms: 36.3811440468, Throughput (time per 100 nodes): 7.00537896156, chunk size: 100
Time elapsed in ms 36.3816270828


----------------------------------------------------------------------------------------
With expunge_all()
And containment test with indexes

create index db_dbgroup_dbnodes_7a672316 on db_dbgroup_dbnodes(dbnode_id);
create index db_dbgroup_dbnodes_a0b4eda0 on db_dbgroup_dbnodes(dbgroup_id);
create UNIQUE index db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key on db_dbgroup_dbnodes(dbgroup_id, dbnode_id);
----------------------------------------------------------------------------------------

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 1.75411915779, Throughput (time per 100 nodes): 1.75411915779
Counter: 200, Time elapsed in ms: 3.47723603249, Throughput (time per 100 nodes): 1.72311687469
Counter: 300, Time elapsed in ms: 5.80984306335, Throughput (time per 100 nodes): 2.33260703087
Counter: 400, Time elapsed in ms: 8.8858230114, Throughput (time per 100 nodes): 3.07597994804
Counter: 500, Time elapsed in ms: 12.418721199, Throughput (time per 100 nodes): 3.53289818764
Counter: 600, Time elapsed in ms: 16.8210380077, Throughput (time per 100 nodes): 4.4023168087
Counter: 700, Time elapsed in ms: 21.8142621517, Throughput (time per 100 nodes): 4.99322414398
Counter: 800, Time elapsed in ms: 27.5076961517, Throughput (time per 100 nodes): 5.69343400002
Counter: 900, Time elapsed in ms: 33.853413105, Throughput (time per 100 nodes): 6.34571695328
Counter: 1000, Time elapsed in ms: 40.7423350811, Throughput (time per 100 nodes): 6.88892197609
Time elapsed in ms 40.7440190315
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.80953502655, Throughput (time per 100 nodes): 0.80953502655, chunk size: 100
Counter: 2, Time elapsed in ms: 2.17273902893, Throughput (time per 100 nodes): 1.36320400238, chunk size: 100
Counter: 3, Time elapsed in ms: 4.28237295151, Throughput (time per 100 nodes): 2.10963392258, chunk size: 100
Counter: 4, Time elapsed in ms: 6.88155198097, Throughput (time per 100 nodes): 2.59917902946, chunk size: 100
Counter: 5, Time elapsed in ms: 10.1589868069, Throughput (time per 100 nodes): 3.2774348259, chunk size: 100
Counter: 6, Time elapsed in ms: 14.0899670124, Throughput (time per 100 nodes): 3.93098020554, chunk size: 100
Counter: 7, Time elapsed in ms: 18.6452338696, Throughput (time per 100 nodes): 4.55526685715, chunk size: 100
Counter: 8, Time elapsed in ms: 23.8521668911, Throughput (time per 100 nodes): 5.20693302155, chunk size: 100
Counter: 9, Time elapsed in ms: 29.6780250072, Throughput (time per 100 nodes): 5.82585811615, chunk size: 100
Counter: 10, Time elapsed in ms: 36.3754470348, Throughput (time per 100 nodes): 6.69742202759, chunk size: 100
Time elapsed in ms 36.3755548

@szoupanos
Copy link
Contributor

szoupanos commented May 4, 2018

I am looking into this and here are some comments (what I write below is with expire_on_commit=False):
(a) SQLA has functions about massive object insertions and updates that semi-skips the ORM level and it promises great speed up (order of magnitude if I saw well). http://docs.sqlalchemy.org/en/latest/faq/performance.html
Unfortunately, such operations can not be used to update relationships which is our case.

(b) What we do now, is inneficient since we add one by one the objects to the relationship
(at aiida.orm.implementation.sqlalchemy.group.Group#add_nodes)

            if to_add not in self._dbgroup.dbnodes:
                # ~ list_nodes.append(to_add)
                self._dbgroup.dbnodes.append(to_add)
(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 3019
Adding all nodes
Time elapsed in ms 35.7646160126
The size of the group is  2
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 1.70406603813, Throughput (time per 100 nodes): 1.70406603813
Counter: 200, Time elapsed in ms: 3.55646085739, Throughput (time per 100 nodes): 1.85239481926
Counter: 300, Time elapsed in ms: 5.97326683998, Throughput (time per 100 nodes): 2.41680598259
Counter: 400, Time elapsed in ms: 8.96025085449, Throughput (time per 100 nodes): 2.98698401451
Counter: 500, Time elapsed in ms: 12.7274558544, Throughput (time per 100 nodes): 3.76720499992
Counter: 600, Time elapsed in ms: 17.0921828747, Throughput (time per 100 nodes): 4.36472702026
Counter: 700, Time elapsed in ms: 22.4721388817, Throughput (time per 100 nodes): 5.379956007
Counter: 800, Time elapsed in ms: 28.7427699566, Throughput (time per 100 nodes): 6.27063107491
Counter: 900, Time elapsed in ms: 35.3984880447, Throughput (time per 100 nodes): 6.65571808815
Counter: 1000, Time elapsed in ms: 42.3793978691, Throughput (time per 100 nodes): 6.98090982437
Time elapsed in ms 42.3813278675
The size of the group is  2
[[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
"Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost
name ==> Verdi autogroup on 2018-04-05 19:43:19
desc ==>
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 1.02550506592, Throughput (time per 100 nodes): 1.02550506592, chunk size: 100
Counter: 2, Time elapsed in ms: 2.52159404755, Throughput (time per 100 nodes): 1.49608898163, chunk size: 100
Counter: 3, Time elapsed in ms: 4.62370800972, Throughput (time per 100 nodes): 2.10211396217, chunk size: 100
Counter: 4, Time elapsed in ms: 7.25097894669, Throughput (time per 100 nodes): 2.62727093697, chunk size: 100
Counter: 5, Time elapsed in ms: 10.4470250607, Throughput (time per 100 nodes): 3.19604611397, chunk size: 100
Counter: 6, Time elapsed in ms: 14.4495790005, Throughput (time per 100 nodes): 4.00255393982, chunk size: 100
Counter: 7, Time elapsed in ms: 19.1723568439, Throughput (time per 100 nodes): 4.72277784348, chunk size: 100
Counter: 8, Time elapsed in ms: 24.5407998562, Throughput (time per 100 nodes): 5.36844301224, chunk size: 100
Counter: 9, Time elapsed in ms: 32.4002580643, Throughput (time per 100 nodes): 7.85945820808, chunk size: 100
Counter: 10, Time elapsed in ms: 40.6627130508, Throughput (time per 100 nodes): 8.26245498657, chunk size: 100
Time elapsed in ms 40.662804842
The size of the group is  1000

We could do something like
self._dbgroup.dbnodes = list_dbnodes + list(self._dbgroup.dbnodes)
which improves the performance when adding a lot of nodes (e.g. 2k at once - it drops from 35 secs to 2,6). However the time is worse when we add nodes one by one.

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 3019
Adding all nodes
Time elapsed in ms 2.65530991554
The size of the group is  2
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 2.78353905678, Throughput (time per 100 nodes): 2.78353905678
Counter: 200, Time elapsed in ms: 6.87580800056, Throughput (time per 100 nodes): 4.09226894379
Counter: 300, Time elapsed in ms: 13.4311919212, Throughput (time per 100 nodes): 6.55538392067
Counter: 400, Time elapsed in ms: 24.9477529526, Throughput (time per 100 nodes): 11.5165610313
Counter: 500, Time elapsed in ms: 35.5219419003, Throughput (time per 100 nodes): 10.5741889477
Counter: 600, Time elapsed in ms: 47.4421348572, Throughput (time per 100 nodes): 11.9201929569
Counter: 700, Time elapsed in ms: 62.1139850616, Throughput (time per 100 nodes): 14.6718502045
Counter: 800, Time elapsed in ms: 77.7739970684, Throughput (time per 100 nodes): 15.6600120068
Counter: 900, Time elapsed in ms: 95.849711895, Throughput (time per 100 nodes): 18.0757148266
Counter: 1000, Time elapsed in ms: 115.185575008, Throughput (time per 100 nodes): 19.3358631134
Time elapsed in ms 115.187716007
The size of the group is  2
[[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
"Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost
name ==> Verdi autogroup on 2018-04-05 19:43:19
desc ==>
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.281466960907, Throughput (time per 100 nodes): 0.281466960907, chunk size: 100
Counter: 2, Time elapsed in ms: 1.41413497925, Throughput (time per 100 nodes): 1.13266801834, chunk size: 100
Counter: 3, Time elapsed in ms: 2.93307185173, Throughput (time per 100 nodes): 1.51893687248, chunk size: 100
Counter: 4, Time elapsed in ms: 5.00314688683, Throughput (time per 100 nodes): 2.0700750351, chunk size: 100
Counter: 5, Time elapsed in ms: 7.94263100624, Throughput (time per 100 nodes): 2.93948411942, chunk size: 100
Counter: 6, Time elapsed in ms: 11.2633240223, Throughput (time per 100 nodes): 3.32069301605, chunk size: 100
Counter: 7, Time elapsed in ms: 15.409528017, Throughput (time per 100 nodes): 4.14620399475, chunk size: 100
Counter: 8, Time elapsed in ms: 19.8101239204, Throughput (time per 100 nodes): 4.4005959034, chunk size: 100
Counter: 9, Time elapsed in ms: 25.0399589539, Throughput (time per 100 nodes): 5.22983503342, chunk size: 100
Counter: 10, Time elapsed in ms: 32.1085138321, Throughput (time per 100 nodes): 7.06855487823, chunk size: 100
Time elapsed in ms 32.1085708141
The size of the group is  1000

We can by-pass the ORM by executing directly an SQL statement to update the relationship as follows:

        insert_txt = ""
        for nid in list_dbnodes_ids:
            insert_txt += "({}, {}), ".format(nid, self._dbgroup.id)
        insert_txt = insert_txt[:-2]

        statement = """INSERT INTO db_dbgroup_dbnodes(dbnode_id, dbgroup_id) values {}""".format(
            insert_txt)
        session.execute(statement)

but the performance gains are not amazing

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 3019
Adding all nodes
Time elapsed in ms 2.24172091484
The size of the group is  2
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 1.16023612022, Throughput (time per 100 nodes): 1.16023612022
Counter: 200, Time elapsed in ms: 2.40981698036, Throughput (time per 100 nodes): 1.24958086014
Counter: 300, Time elapsed in ms: 4.33016419411, Throughput (time per 100 nodes): 1.92034721375
Counter: 400, Time elapsed in ms: 6.88652300835, Throughput (time per 100 nodes): 2.55635881424
Counter: 500, Time elapsed in ms: 10.0732610226, Throughput (time per 100 nodes): 3.18673801422
Counter: 600, Time elapsed in ms: 16.0039260387, Throughput (time per 100 nodes): 5.93066501617
Counter: 700, Time elapsed in ms: 20.9258131981, Throughput (time per 100 nodes): 4.92188715935
Counter: 800, Time elapsed in ms: 26.4910550117, Throughput (time per 100 nodes): 5.56524181366
Counter: 900, Time elapsed in ms: 32.5509080887, Throughput (time per 100 nodes): 6.05985307693
Counter: 1000, Time elapsed in ms: 39.3975532055, Throughput (time per 100 nodes): 6.84664511681
Time elapsed in ms 39.3997671604
The size of the group is  2
[[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
"Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost
name ==> Verdi autogroup on 2018-04-05 19:43:19
desc ==>
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.254683017731, Throughput (time per 100 nodes): 0.254683017731, chunk size: 100
Counter: 2, Time elapsed in ms: 1.15336799622, Throughput (time per 100 nodes): 0.898684978485, chunk size: 100
Counter: 3, Time elapsed in ms: 2.6940279007, Throughput (time per 100 nodes): 1.54065990448, chunk size: 100
Counter: 4, Time elapsed in ms: 5.09806203842, Throughput (time per 100 nodes): 2.40403413773, chunk size: 100
Counter: 5, Time elapsed in ms: 7.89258599281, Throughput (time per 100 nodes): 2.79452395439, chunk size: 100
Counter: 6, Time elapsed in ms: 11.2417330742, Throughput (time per 100 nodes): 3.34914708138, chunk size: 100
Counter: 7, Time elapsed in ms: 15.7686090469, Throughput (time per 100 nodes): 4.52687597275, chunk size: 100
Counter: 8, Time elapsed in ms: 20.4742219448, Throughput (time per 100 nodes): 4.70561289787, chunk size: 100
Counter: 9, Time elapsed in ms: 25.9905340672, Throughput (time per 100 nodes): 5.51631212234, chunk size: 100
Counter: 10, Time elapsed in ms: 34.8677790165, Throughput (time per 100 nodes): 8.87724494934, chunk size: 100
Time elapsed in ms 34.868571043
The size of the group is  1000

@szoupanos
Copy link
Contributor

What is amazing it the SQL insert statement with a session.expunge_all() at the beginning - it's faster than Django but we have to do it properly in order not to mess up everything.

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 3019
Adding all nodes
Time elapsed in ms 0.0952980518341
The size of the group is  2
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 0.492502212524, Throughput (time per 100 nodes): 0.492502212524
Counter: 200, Time elapsed in ms: 0.59246301651, Throughput (time per 100 nodes): 0.0999608039856
Counter: 300, Time elapsed in ms: 0.692285060883, Throughput (time per 100 nodes): 0.0998220443726
Counter: 400, Time elapsed in ms: 0.793632030487, Throughput (time per 100 nodes): 0.101346969604
Counter: 500, Time elapsed in ms: 0.894867181778, Throughput (time per 100 nodes): 0.101235151291
Counter: 600, Time elapsed in ms: 0.995048999786, Throughput (time per 100 nodes): 0.100181818008
Counter: 700, Time elapsed in ms: 1.09379220009, Throughput (time per 100 nodes): 0.0987432003021
Counter: 800, Time elapsed in ms: 1.19934821129, Throughput (time per 100 nodes): 0.1055560112
Counter: 900, Time elapsed in ms: 1.30081105232, Throughput (time per 100 nodes): 0.101462841034
Counter: 1000, Time elapsed in ms: 1.40256309509, Throughput (time per 100 nodes): 0.10175204277
Time elapsed in ms 1.40450906754
The size of the group is  2
[[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
"Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost
name ==> Verdi autogroup on 2018-04-05 19:43:19
desc ==>
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 3019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.015928030014, Throughput (time per 100 nodes): 0.015928030014, chunk size: 100
Counter: 2, Time elapsed in ms: 0.08873295784, Throughput (time per 100 nodes): 0.0728049278259, chunk size: 100
Counter: 3, Time elapsed in ms: 0.0982210636139, Throughput (time per 100 nodes): 0.00948810577393, chunk size: 100
Counter: 4, Time elapsed in ms: 0.109713077545, Throughput (time per 100 nodes): 0.0114920139313, chunk size: 100
Counter: 5, Time elapsed in ms: 0.126136064529, Throughput (time per 100 nodes): 0.0164229869843, chunk size: 100
Counter: 6, Time elapsed in ms: 0.136846065521, Throughput (time per 100 nodes): 0.0107100009918, chunk size: 100
Counter: 7, Time elapsed in ms: 0.158164024353, Throughput (time per 100 nodes): 0.0213179588318, chunk size: 100
Counter: 8, Time elapsed in ms: 0.181298017502, Throughput (time per 100 nodes): 0.0231339931488, chunk size: 100
Counter: 9, Time elapsed in ms: 0.200289011002, Throughput (time per 100 nodes): 0.0189909934998, chunk size: 100
Counter: 10, Time elapsed in ms: 0.214483976364, Throughput (time per 100 nodes): 0.0141949653625, chunk size: 100
Time elapsed in ms 0.214550971985
The size of the group is  1000


@szoupanos
Copy link
Contributor

szoupanos commented May 4, 2018

I think that I found a nice solution inspired by our discussions which I am going to implement if there is no objection (@giovannipizzi, @ltalirz - we can also talk if needed)

I will keep an optimized version of the ORM related implementation (but still slow when compared to Django) and I will also have a PostgreSQL specific, SQL implementation with an SQL insert statement that will not use session.expunge_all() and it will let the session objects up-to-date (so after the operation, the ORM group objects will have the right members etc). It is also not affected by the expire_on_commit option.

The SQL version will be faster than the existing Django implementation (half the time or even lower)

We would be able to choose the desired SQLA group add code using a flag.

The SQLA with SQL insert

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:47:59" [type autogroup.run], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:47:59" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 13019
Adding 12000 nodes
Time elapsed in ms 0.191395998001
The size of the group is  12000
Deleting the created group
Creating a temp group
Number of available Cif data 13019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 0.0865461826324, Throughput (time per 100 nodes): 0.0865461826324
Counter: 200, Time elapsed in ms: 0.196799993515, Throughput (time per 100 nodes): 0.110253810883
Counter: 300, Time elapsed in ms: 0.293050050735, Throughput (time per 100 nodes): 0.0962500572205
Counter: 400, Time elapsed in ms: 0.397217035294, Throughput (time per 100 nodes): 0.104166984558
Counter: 500, Time elapsed in ms: 0.489093065262, Throughput (time per 100 nodes): 0.0918760299683
Counter: 600, Time elapsed in ms: 0.585044145584, Throughput (time per 100 nodes): 0.0959510803223
Counter: 700, Time elapsed in ms: 0.677579164505, Throughput (time per 100 nodes): 0.0925350189209
Counter: 800, Time elapsed in ms: 0.750379085541, Throughput (time per 100 nodes): 0.0727999210358
Counter: 900, Time elapsed in ms: 0.819831132889, Throughput (time per 100 nodes): 0.069452047348
Counter: 1000, Time elapsed in ms: 0.913388967514, Throughput (time per 100 nodes): 0.0935578346252
Time elapsed in ms 0.913459062576
The size of the group is  3
Re-adding nodes one by one (the same ones)
Counter: 100, Time elapsed in ms: 0.095535993576, Throughput (time per 100 nodes): 0.095535993576
Counter: 200, Time elapsed in ms: 0.184896945953, Throughput (time per 100 nodes): 0.0893609523773
Counter: 300, Time elapsed in ms: 0.275234937668, Throughput (time per 100 nodes): 0.0903379917145
Counter: 400, Time elapsed in ms: 0.367427110672, Throughput (time per 100 nodes): 0.0921921730042
Counter: 500, Time elapsed in ms: 0.458844900131, Throughput (time per 100 nodes): 0.0914177894592
Counter: 600, Time elapsed in ms: 0.547655105591, Throughput (time per 100 nodes): 0.0888102054596
Counter: 700, Time elapsed in ms: 0.614478111267, Throughput (time per 100 nodes): 0.0668230056763
Counter: 800, Time elapsed in ms: 0.682828903198, Throughput (time per 100 nodes): 0.0683507919312
Counter: 900, Time elapsed in ms: 0.746294975281, Throughput (time per 100 nodes): 0.0634660720825
Counter: 1000, Time elapsed in ms: 0.831932067871, Throughput (time per 100 nodes): 0.0856370925903
Time elapsed in ms 0.831996917725
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 13019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.0213191509247, Throughput (time per 100 nodes): 0.0213191509247, chunk size: 100
Counter: 2, Time elapsed in ms: 0.0245561599731, Throughput (time per 100 nodes): 0.00323700904846, chunk size: 100
Counter: 3, Time elapsed in ms: 0.0274901390076, Throughput (time per 100 nodes): 0.00293397903442, chunk size: 100
Counter: 4, Time elapsed in ms: 0.030385017395, Throughput (time per 100 nodes): 0.00289487838745, chunk size: 100
Counter: 5, Time elapsed in ms: 0.0333781242371, Throughput (time per 100 nodes): 0.00299310684204, chunk size: 100
Counter: 6, Time elapsed in ms: 0.0378761291504, Throughput (time per 100 nodes): 0.00449800491333, chunk size: 100
Counter: 7, Time elapsed in ms: 0.0432991981506, Throughput (time per 100 nodes): 0.00542306900024, chunk size: 100
Counter: 8, Time elapsed in ms: 0.0464789867401, Throughput (time per 100 nodes): 0.00317978858948, chunk size: 100
Counter: 9, Time elapsed in ms: 0.0495121479034, Throughput (time per 100 nodes): 0.00303316116333, chunk size: 100
Counter: 10, Time elapsed in ms: 0.0523011684418, Throughput (time per 100 nodes): 0.00278902053833, chunk size: 100
Time elapsed in ms 0.0523722171783
The size of the group is  1000

The Django version

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p dj_issue_1319 run test_script.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:53:08" [type autogroup.run], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:12:54" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:53:08" [type autogroup.run], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:12:54" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 14064
Adding 12000 nodes
Time elapsed in ms 0.807929992676
The size of the group is  12000
Deleting the created group
Creating a temp group
Number of available Cif data 14064
Adding nodes one by one
Counter: 100, Time elapsed in ms: 0.453313827515, Throughput (time per 100 nodes): 0.453313827515
Counter: 200, Time elapsed in ms: 0.9067029953, Throughput (time per 100 nodes): 0.453389167786
Counter: 300, Time elapsed in ms: 1.36614704132, Throughput (time per 100 nodes): 0.459444046021
Counter: 400, Time elapsed in ms: 1.79614591599, Throughput (time per 100 nodes): 0.429998874664
Counter: 500, Time elapsed in ms: 2.23816990852, Throughput (time per 100 nodes): 0.442023992538
Counter: 600, Time elapsed in ms: 2.74618887901, Throughput (time per 100 nodes): 0.50801897049
Counter: 700, Time elapsed in ms: 3.24038505554, Throughput (time per 100 nodes): 0.494196176529
Counter: 800, Time elapsed in ms: 3.7128238678, Throughput (time per 100 nodes): 0.472438812256
Counter: 900, Time elapsed in ms: 4.19122886658, Throughput (time per 100 nodes): 0.478404998779
Counter: 1000, Time elapsed in ms: 4.64931583405, Throughput (time per 100 nodes): 0.458086967468
Time elapsed in ms 4.64941191673
The size of the group is  3
Re-adding nodes one by one (the same ones)
Counter: 100, Time elapsed in ms: 0.172289848328, Throughput (time per 100 nodes): 0.172289848328
Counter: 200, Time elapsed in ms: 0.372611045837, Throughput (time per 100 nodes): 0.20032119751
Counter: 300, Time elapsed in ms: 0.56739282608, Throughput (time per 100 nodes): 0.194781780243
Counter: 400, Time elapsed in ms: 0.761443853378, Throughput (time per 100 nodes): 0.194051027298
Counter: 500, Time elapsed in ms: 0.939738035202, Throughput (time per 100 nodes): 0.178294181824
Counter: 600, Time elapsed in ms: 1.14302301407, Throughput (time per 100 nodes): 0.203284978867
Counter: 700, Time elapsed in ms: 1.33699083328, Throughput (time per 100 nodes): 0.193967819214
Counter: 800, Time elapsed in ms: 1.52929902077, Throughput (time per 100 nodes): 0.192308187485
Counter: 900, Time elapsed in ms: 1.76694893837, Throughput (time per 100 nodes): 0.237649917603
Counter: 1000, Time elapsed in ms: 1.97885894775, Throughput (time per 100 nodes): 0.211910009384
Time elapsed in ms 1.97891497612
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 14064
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.034460067749, Throughput (time per 100 nodes): 0.034460067749, chunk size: 100
Counter: 2, Time elapsed in ms: 0.0495519638062, Throughput (time per 100 nodes): 0.0150918960571, chunk size: 100
Counter: 3, Time elapsed in ms: 0.0624499320984, Throughput (time per 100 nodes): 0.0128979682922, chunk size: 100
Counter: 4, Time elapsed in ms: 0.0760219097137, Throughput (time per 100 nodes): 0.0135719776154, chunk size: 100
Counter: 5, Time elapsed in ms: 0.0910680294037, Throughput (time per 100 nodes): 0.0150461196899, chunk size: 100
Counter: 6, Time elapsed in ms: 0.102406978607, Throughput (time per 100 nodes): 0.0113389492035, chunk size: 100
Counter: 7, Time elapsed in ms: 0.114480018616, Throughput (time per 100 nodes): 0.0120730400085, chunk size: 100
Counter: 8, Time elapsed in ms: 0.128844976425, Throughput (time per 100 nodes): 0.0143649578094, chunk size: 100
Counter: 9, Time elapsed in ms: 0.144758939743, Throughput (time per 100 nodes): 0.0159139633179, chunk size: 100
Counter: 10, Time elapsed in ms: 0.155833005905, Throughput (time per 100 nodes): 0.0110740661621, chunk size: 100
Time elapsed in ms 0.15589094162
The size of the group is  1000

@ltalirz
Copy link
Member

ltalirz commented May 4, 2018

@szoupanos Thanks a lot for your work on this, it really matters to us!

As hoped, performance-wise this is an improvement of several orders of magnitude - your numbers on adding nodes in batches of 100 with sqla are not very consistent (they vary by 1 order of magnitude) but they are really from a different world than before.
I assume, adding 10k/100k nodes in one go will actually not even be 100x/1000x as expensive as adding 100 nodes (perhaps you can test this once).

Happy to talk to you on Monday to get a better understanding of how it works, but from our side this is great!

@muhrin
Copy link
Contributor

muhrin commented Jun 7, 2018

@szoupanos Can you please have a look at this:

https://github.com/muhrin/aiida_core/tree/fix_1319_group_adding_slow

Description:
I'm trying a way to add nodes to a group that doesn't require an explcit
check to see if the node is already there. There are several parts to
this:

  1. A UNIQUE constraint needs to be added to the association table as we
    have in Django. Something like:
    CREATE UNIQUE INDEX db_dbgroup_dbnodes_dbgroup_id_dbnode_id_key ON public.db_dbgroup_dbnodes (dbgroup_id, dbnode_id);
  2. Add the nodes one by one and flush after each one which will cause
    the IntegrityError to appear if the constraint is violated
  3. Wrap the append/flush in a nested session so that it is automatically
    rolled back in the case of a constraint violation
  4. finally after everything do the final commit (because flush just puts
    it in SQLs buffers)

NOTE: Before this gets considered for a merge we have to find a way to
put the constraint in the model. I didn't see how to do this.

BEFORE:

Creating a temp group
Number of available Cif data 10000
Adding nodes one by one
Counter: 100, Time elspsed in ms: 4.6177740097, Throughput (time per 100 nodes): 4.6177740097
Counter: 200, Time elspsed in ms: 7.62700915337, Throughput (time per 100 nodes): 3.00923514366
Counter: 300, Time elspsed in ms: 10.8559751511, Throughput (time per 100 nodes): 3.2289659977
Counter: 400, Time elspsed in ms: 14.5526180267, Throughput (time per 100 nodes): 3.69664287567
Counter: 500, Time elspsed in ms: 18.8845729828, Throughput (time per 100 nodes): 4.33195495605
Counter: 600, Time elspsed in ms: 23.8332829475, Throughput (time per 100 nodes): 4.94870996475
Counter: 700, Time elspsed in ms: 29.1641230583, Throughput (time per 100 nodes): 5.33084011078
Counter: 800, Time elspsed in ms: 35.1758539677, Throughput (time per 100 nodes): 6.01173090935
Counter: 900, Time elspsed in ms: 41.59299016, Throughput (time per 100 nodes): 6.41713619232
Counter: 1000, Time elspsed in ms: 48.6356980801, Throughput (time per 100 nodes): 7.04270792007
Time elapsed in ms 48.6389050484
Deleting the created group
Creating a temp group
Adding nodes in batches of 100
Counter: 1, Time elspsed in ms: 2.91910505295, Throughput (time per 100 nodes): 2.91910505295, chunk size: 100
Counter: 2, Time elspsed in ms: 3.81247615814, Throughput (time per 100 nodes): 0.893371105194, chunk size: 100
Counter: 3, Time elspsed in ms: 5.10625600815, Throughput (time per 100 nodes): 1.29377985001, chunk size: 100
Counter: 4, Time elspsed in ms: 6.89171504974, Throughput (time per 100 nodes): 1.7854590416, chunk size: 100
Counter: 5, Time elspsed in ms: 9.12970519066, Throughput (time per 100 nodes): 2.23799014091, chunk size: 100
Counter: 6, Time elspsed in ms: 11.8159332275, Throughput (time per 100 nodes): 2.68622803688, chunk size: 100
Counter: 7, Time elspsed in ms: 14.982817173, Throughput (time per 100 nodes): 3.16688394547, chunk size: 100
Counter: 8, Time elspsed in ms: 18.6862950325, Throughput (time per 100 nodes): 3.7034778595, chunk size: 100
Counter: 9, Time elspsed in ms: 22.7971901894, Throughput (time per 100 nodes): 4.11089515686, chunk size: 100
Counter: 10, Time elspsed in ms: 27.4252841473, Throughput (time per 100 nodes): 4.6280939579, chunk size: 100
Time elapsed in ms 27.4253129959

AFTER:

Creating a temp group
Number of available Cif data 10000
Adding nodes one by one
Counter: 100, Time elspsed in ms: 4.20299100876, Throughput (time per 100 nodes): 4.20299100876
Counter: 200, Time elspsed in ms: 6.07379484177, Throughput (time per 100 nodes): 1.87080383301
Counter: 300, Time elspsed in ms: 7.84407901764, Throughput (time per 100 nodes): 1.77028417587
Counter: 400, Time elspsed in ms: 9.63817286491, Throughput (time per 100 nodes): 1.79409384727
Counter: 500, Time elspsed in ms: 11.4495809078, Throughput (time per 100 nodes): 1.81140804291
Counter: 600, Time elspsed in ms: 13.2314128876, Throughput (time per 100 nodes): 1.78183197975
Counter: 700, Time elspsed in ms: 15.0530498028, Throughput (time per 100 nodes): 1.82163691521
Counter: 800, Time elspsed in ms: 16.7907588482, Throughput (time per 100 nodes): 1.73770904541
Counter: 900, Time elspsed in ms: 18.5276179314, Throughput (time per 100 nodes): 1.73685908318
Counter: 1000, Time elspsed in ms: 20.236964941, Throughput (time per 100 nodes): 1.70934700966
Time elapsed in ms 20.2405498028
Deleting the created group
Creating a temp group
Adding nodes in batches of 100
Counter: 1, Time elspsed in ms: 2.76226806641, Throughput (time per 100 nodes): 2.76226806641, chunk size: 100
Counter: 2, Time elspsed in ms: 3.27148509026, Throughput (time per 100 nodes): 0.509217023849, chunk size: 100
Counter: 3, Time elspsed in ms: 3.71189403534, Throughput (time per 100 nodes): 0.440408945084, chunk size: 100
Counter: 4, Time elspsed in ms: 4.12742805481, Throughput (time per 100 nodes): 0.41553401947, chunk size: 100
Counter: 5, Time elspsed in ms: 4.56759214401, Throughput (time per 100 nodes): 0.440164089203, chunk size: 100
Counter: 6, Time elspsed in ms: 5.03818511963, Throughput (time per 100 nodes): 0.470592975616, chunk size: 100
Counter: 7, Time elspsed in ms: 5.45378518105, Throughput (time per 100 nodes): 0.415600061417, chunk size: 100
Counter: 8, Time elspsed in ms: 5.88933300972, Throughput (time per 100 nodes): 0.435547828674, chunk size: 100
Counter: 9, Time elspsed in ms: 6.29954600334, Throughput (time per 100 nodes): 0.410212993622, chunk size: 100
Counter: 10, Time elspsed in ms: 6.71973013878, Throughput (time per 100 nodes): 0.420184135437, chunk size: 100
Time elapsed in ms 6.71975111961

@szoupanos
Copy link
Contributor

Thanks a lot Martin for this but in my tests I dont see the big improvements that you see with your changes and the SQLA code remains 2 orders of magnitude slower than the Django code (on group addition).

Some numbers from the tests on my VM:

SQLA - original code

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script_2.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:47:59" [type autogroup.run], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:47:59" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 13019
Adding 1200 nodes
Time elapsed in ms 77.4165370464
The size of the group is  1200
Deleting the created group
Creating a temp group
Number of available Cif data 13019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 12.5682499409, Throughput (time per 100 nodes): 12.5682499409
Counter: 200, Time elapsed in ms: 25.4672391415, Throughput (time per 100 nodes): 12.8989892006
Counter: 300, Time elapsed in ms: 39.6301369667, Throughput (time per 100 nodes): 14.1628978252
Counter: 400, Time elapsed in ms: 54.6256980896, Throughput (time per 100 nodes): 14.9955611229
Counter: 500, Time elapsed in ms: 70.004997015, Throughput (time per 100 nodes): 15.3792989254
Counter: 600, Time elapsed in ms: 85.4559469223, Throughput (time per 100 nodes): 15.4509499073
Counter: 700, Time elapsed in ms: 101.881716967, Throughput (time per 100 nodes): 16.4257700443
Counter: 800, Time elapsed in ms: 118.777574062, Throughput (time per 100 nodes): 16.8958570957
Counter: 900, Time elapsed in ms: 141.36629796, Throughput (time per 100 nodes): 22.5887238979
Counter: 1000, Time elapsed in ms: 162.874361992, Throughput (time per 100 nodes): 21.5080640316
Time elapsed in ms 162.874467134
The size of the group is  3
Re-adding nodes one by one (the same ones)
Counter: 100, Time elapsed in ms: 16.2928650379, Throughput (time per 100 nodes): 16.2928650379
Counter: 200, Time elapsed in ms: 32.553622961, Throughput (time per 100 nodes): 16.2607579231
Counter: 300, Time elapsed in ms: 52.4816901684, Throughput (time per 100 nodes): 19.9280672073
Counter: 400, Time elapsed in ms: 70.2084860802, Throughput (time per 100 nodes): 17.7267959118
Counter: 500, Time elapsed in ms: 86.4412300587, Throughput (time per 100 nodes): 16.2327439785
Counter: 600, Time elapsed in ms: 102.860389948, Throughput (time per 100 nodes): 16.4191598892
Counter: 700, Time elapsed in ms: 122.454787016, Throughput (time per 100 nodes): 19.594397068
Counter: 800, Time elapsed in ms: 139.391860008, Throughput (time per 100 nodes): 16.9370729923
Counter: 900, Time elapsed in ms: 155.843513966, Throughput (time per 100 nodes): 16.4516539574
Counter: 1000, Time elapsed in ms: 173.634629011, Throughput (time per 100 nodes): 17.7911150455
Time elapsed in ms 173.634781122
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 13019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 2.31874012947, Throughput (time per 100 nodes): 2.31874012947, chunk size: 100
Counter: 2, Time elapsed in ms: 5.32028722763, Throughput (time per 100 nodes): 3.00154709816, chunk size: 100
Counter: 3, Time elapsed in ms: 8.79141902924, Throughput (time per 100 nodes): 3.47113180161, chunk size: 100
Counter: 4, Time elapsed in ms: 12.915938139, Throughput (time per 100 nodes): 4.12451910973, chunk size: 100
Counter: 5, Time elapsed in ms: 17.9667701721, Throughput (time per 100 nodes): 5.05083203316, chunk size: 100
Counter: 6, Time elapsed in ms: 23.8949911594, Throughput (time per 100 nodes): 5.92822098732, chunk size: 100
Counter: 7, Time elapsed in ms: 30.1269760132, Throughput (time per 100 nodes): 6.23198485374, chunk size: 100
Counter: 8, Time elapsed in ms: 37.1440711021, Throughput (time per 100 nodes): 7.01709508896, chunk size: 100
Counter: 9, Time elapsed in ms: 44.9562780857, Throughput (time per 100 nodes): 7.81220698357, chunk size: 100
Counter: 10, Time elapsed in ms: 52.8497531414, Throughput (time per 100 nodes): 7.89347505569, chunk size: 100
Time elapsed in ms 52.8498101234
The size of the group is  1000

SQLA with subsession

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p sqla_issue_1319 run test_script_2.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:47:59" [type autogroup.run], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:43:19" [type autogroup.run], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:47:59" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 13019
Adding 120 nodes
Time elapsed in ms 2.14273405075
The size of the group is  120
Deleting the created group
Creating a temp group
Number of available Cif data 13019
Adding nodes one by one
Counter: 100, Time elapsed in ms: 10.6852970123, Throughput (time per 100 nodes): 10.6852970123
Counter: 200, Time elapsed in ms: 21.3331038952, Throughput (time per 100 nodes): 10.6478068829
Counter: 300, Time elapsed in ms: 31.8032779694, Throughput (time per 100 nodes): 10.4701740742
Counter: 400, Time elapsed in ms: 42.4894719124, Throughput (time per 100 nodes): 10.686193943
Counter: 500, Time elapsed in ms: 53.041228056, Throughput (time per 100 nodes): 10.5517561436
Counter: 600, Time elapsed in ms: 63.7002768517, Throughput (time per 100 nodes): 10.6590487957
Counter: 700, Time elapsed in ms: 74.8987460136, Throughput (time per 100 nodes): 11.198469162
Counter: 800, Time elapsed in ms: 86.3199770451, Throughput (time per 100 nodes): 11.4212310314
Counter: 900, Time elapsed in ms: 97.5338709354, Throughput (time per 100 nodes): 11.2138938904
Counter: 1000, Time elapsed in ms: 110.402652979, Throughput (time per 100 nodes): 12.8687820435
Time elapsed in ms 110.403358936
The size of the group is  3
Re-adding nodes one by one (the same ones)
Counter: 100, Time elapsed in ms: 13.5112700462, Throughput (time per 100 nodes): 13.5112700462
Counter: 200, Time elapsed in ms: 28.5747659206, Throughput (time per 100 nodes): 15.0634958744
Counter: 300, Time elapsed in ms: 39.8427219391, Throughput (time per 100 nodes): 11.2679560184
Counter: 400, Time elapsed in ms: 52.189538002, Throughput (time per 100 nodes): 12.3468160629
Counter: 500, Time elapsed in ms: 63.49888587, Throughput (time per 100 nodes): 11.309347868
Counter: 600, Time elapsed in ms: 78.9269368649, Throughput (time per 100 nodes): 15.4280509949
Counter: 700, Time elapsed in ms: 91.5692420006, Throughput (time per 100 nodes): 12.6423051357
Counter: 800, Time elapsed in ms: 104.076093912, Throughput (time per 100 nodes): 12.5068519115
Counter: 900, Time elapsed in ms: 120.774966955, Throughput (time per 100 nodes): 16.6988730431
Counter: 1000, Time elapsed in ms: 132.555281878, Throughput (time per 100 nodes): 11.7803149223
Time elapsed in ms 132.555385828
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 13019
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 1.83132791519, Throughput (time per 100 nodes): 1.83132791519, chunk size: 100
Counter: 2, Time elapsed in ms: 4.11650204659, Throughput (time per 100 nodes): 2.28517413139, chunk size: 100
Counter: 3, Time elapsed in ms: 6.58631706238, Throughput (time per 100 nodes): 2.46981501579, chunk size: 100
Counter: 4, Time elapsed in ms: 8.87204313278, Throughput (time per 100 nodes): 2.2857260704, chunk size: 100
Counter: 5, Time elapsed in ms: 11.1532249451, Throughput (time per 100 nodes): 2.28118181229, chunk size: 100
Counter: 6, Time elapsed in ms: 13.1678059101, Throughput (time per 100 nodes): 2.01458096504, chunk size: 100
Counter: 7, Time elapsed in ms: 15.2732419968, Throughput (time per 100 nodes): 2.10543608665, chunk size: 100
Counter: 8, Time elapsed in ms: 17.3607001305, Throughput (time per 100 nodes): 2.0874581337, chunk size: 100
Counter: 9, Time elapsed in ms: 19.3739910126, Throughput (time per 100 nodes): 2.01329088211, chunk size: 100
Counter: 10, Time elapsed in ms: 21.4802761078, Throughput (time per 100 nodes): 2.10628509521, chunk size: 100
Time elapsed in ms 21.4803500175
The size of the group is  1000

Django

(aiidapy) aiida@ubuntu-aiida-vm1:~/issue_1319$ verdi -p dj_issue_1319 run test_script_2.py
#1 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:53:08" [type autogroup.run], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:12:54" [type autogroup.run], of user aiida@localhost>], [<Group: "test_group" [user-defined], of user aiida@localhost>]]
Deleting the group  "test_group" [user-defined], of user aiida@localhost
#2 All available groups [[<Group: "Verdi autogroup on 2018-04-05 19:53:08" [type autogroup.run], of user aiida@localhost>], [<Group: "Verdi autogroup on 2018-05-04 15:12:54" [type autogroup.run], of user aiida@localhost>]]
Creating a temp group
Number of available Cif data 14064
Adding 120 nodes
Time elapsed in ms 0.0135979652405
The size of the group is  120
Deleting the created group
Creating a temp group
Number of available Cif data 14064
Adding nodes one by one
Counter: 100, Time elapsed in ms: 0.444473028183, Throughput (time per 100 nodes): 0.444473028183
Counter: 200, Time elapsed in ms: 0.89825797081, Throughput (time per 100 nodes): 0.453784942627
Counter: 300, Time elapsed in ms: 1.35611009598, Throughput (time per 100 nodes): 0.457852125168
Counter: 400, Time elapsed in ms: 1.79574203491, Throughput (time per 100 nodes): 0.439631938934
Counter: 500, Time elapsed in ms: 2.24151396751, Throughput (time per 100 nodes): 0.445771932602
Counter: 600, Time elapsed in ms: 2.69080901146, Throughput (time per 100 nodes): 0.449295043945
Counter: 700, Time elapsed in ms: 3.07898592949, Throughput (time per 100 nodes): 0.38817691803
Counter: 800, Time elapsed in ms: 3.51380491257, Throughput (time per 100 nodes): 0.434818983078
Counter: 900, Time elapsed in ms: 3.96095991135, Throughput (time per 100 nodes): 0.447154998779
Counter: 1000, Time elapsed in ms: 4.41328692436, Throughput (time per 100 nodes): 0.452327013016
Time elapsed in ms 4.41394996643
The size of the group is  3
Re-adding nodes one by one (the same ones)
Counter: 100, Time elapsed in ms: 0.184542179108, Throughput (time per 100 nodes): 0.184542179108
Counter: 200, Time elapsed in ms: 0.351608037949, Throughput (time per 100 nodes): 0.167065858841
Counter: 300, Time elapsed in ms: 0.518448114395, Throughput (time per 100 nodes): 0.166840076447
Counter: 400, Time elapsed in ms: 0.687360048294, Throughput (time per 100 nodes): 0.168911933899
Counter: 500, Time elapsed in ms: 0.858123064041, Throughput (time per 100 nodes): 0.170763015747
Counter: 600, Time elapsed in ms: 1.02387809753, Throughput (time per 100 nodes): 0.165755033493
Counter: 700, Time elapsed in ms: 1.1977159977, Throughput (time per 100 nodes): 0.173837900162
Counter: 800, Time elapsed in ms: 1.37265896797, Throughput (time per 100 nodes): 0.174942970276
Counter: 900, Time elapsed in ms: 1.54253196716, Throughput (time per 100 nodes): 0.169872999191
Counter: 1000, Time elapsed in ms: 1.71694898605, Throughput (time per 100 nodes): 0.17441701889
Time elapsed in ms 1.71701002121
The size of the group is  1000
Deleting the created group
Creating a temp group
Number of available Cif data 14064
Adding nodes in batches of 100
Counter: 1, Time elapsed in ms: 0.0175650119781, Throughput (time per 100 nodes): 0.0175650119781, chunk size: 100
Counter: 2, Time elapsed in ms: 0.0298590660095, Throughput (time per 100 nodes): 0.0122940540314, chunk size: 100
Counter: 3, Time elapsed in ms: 0.0419690608978, Throughput (time per 100 nodes): 0.0121099948883, chunk size: 100
Counter: 4, Time elapsed in ms: 0.0576741695404, Throughput (time per 100 nodes): 0.0157051086426, chunk size: 100
Counter: 5, Time elapsed in ms: 0.0902359485626, Throughput (time per 100 nodes): 0.0325617790222, chunk size: 100
Counter: 6, Time elapsed in ms: 0.105304002762, Throughput (time per 100 nodes): 0.0150680541992, chunk size: 100
Counter: 7, Time elapsed in ms: 0.125602960587, Throughput (time per 100 nodes): 0.0202989578247, chunk size: 100
Counter: 8, Time elapsed in ms: 0.138172149658, Throughput (time per 100 nodes): 0.0125691890717, chunk size: 100
Counter: 9, Time elapsed in ms: 0.153599023819, Throughput (time per 100 nodes): 0.0154268741608, chunk size: 100
Counter: 10, Time elapsed in ms: 0.171733140945, Throughput (time per 100 nodes): 0.0181341171265, chunk size: 100
Time elapsed in ms 0.172496080399
The size of the group is  1000

@giovannipizzi
Copy link
Member

@szoupanos depending on the amount of work left we need to decide if we keep it for 1.0 or 1.1

@sphuber sphuber removed their assignment Dec 3, 2018
@lekah lekah assigned szoupanos and unassigned szoupanos and lekah Dec 12, 2018
@szoupanos szoupanos modified the milestones: v1.0.0, v1.0.0b1 Feb 11, 2019
@szoupanos
Copy link
Contributor

There has been an extensive research on this regarding the best solution.
The available solutions can be found at https://github.com/szoupanos/sqla_vs_dj/tree/master/aiida_schema_exp. The fastest for SQLA are the bulk_add_to_group_pure_sql_one_line.py and the bulk_add_to_group_sql_models_one_line_dupl.py.

The bulk_add_to_group_pure_sql_one_line.py (it can be easily adapted to avoid inserting duplicates) is the fastest which issues a statement like the following:

INSERT INTO db_dbgroup_dbnodes(dbnode_id, dbgroup_id) values (1, 1), (2, 1), (3, 1), (4, 1), (5, 1)

However, I understand that we would use, even partially, the SQLA interface (even if we don't use the ORM). So we will proceed with solution bulk_add_to_group_sql_models_one_line_dupl.py

ltalirz pushed a commit that referenced this issue Feb 22, 2019
… the group addition procedure and import method of SQLA should benefit from it (#2471)

Fast group addition, skipping the ORM

This PR solves issue #1319 and provides a fast way to add nodes to a group that circumvents the ORM. Import/export benefit from this; particularly import under SQLA.

In order to achieve this, it uses RAW SQL statements. It was agreed by @giovannipizzi  @sphuber 
 and @szoupanos that this is acceptable for aiida 0.X. sqlalchemy >= 1.1 provides better solutions for this in aiida 1.0 (see comments in the code).

Furthermore, it adds a new migration `7a6587e16f4c`.
For consistency, PR #2514 inserts this migration also in the upstream branch for aiida 1.0 (and handles collisions with other following migrations).
@sphuber
Copy link
Contributor

sphuber commented Mar 4, 2019

Fixed in PR #2518

@sphuber sphuber closed this as completed Mar 4, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants