Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow usage of pk in QueryBuilder project and filter statements #2577

Closed
sphuber opened this issue Mar 6, 2019 · 8 comments
Closed

Allow usage of pk in QueryBuilder project and filter statements #2577

sphuber opened this issue Mar 6, 2019 · 8 comments

Comments

@sphuber
Copy link
Contributor

sphuber commented Mar 6, 2019

The database models use id as the column name for the primary key, but unfortunately, id is a reserved keyword in python. This is also the reason that Django uses pk to refer to the primary key column. In AiiDA we are using a mixture. Some of the ORM classes define both the id and the pk property, where the pk property just proxies through to the id one.

Ideally we would change the database models to use only pk and then we would only use pk throughout the code. Even just restricting the usage of id to the database models and using pk exclusively everywhere else would already be an improvement.

But the worst is that currently sometimes a user can use either name, but sometimes one has to use id. For example, attempting to filter on pk in the QueryBuilder yields the cryptic message:

In [3]: QueryBuilder().append(ProcessNode, filters={'pk': 6820483}, tag='parent', project=['id']).append(ProcessNode, with_ancestors='parent', project=['
   ...: id']).all()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-1269b974abcb> in <module>()
----> 1 QueryBuilder().append(ProcessNode, filters={'pk': 6820483}, tag='parent', project=['id']).append(ProcessNode, with_ancestors='parent', project=['id']).all()

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/querybuilder.pyc in all(self, batch_size)
   2217         :returns: a list of lists of all projected entities.
   2218         """
-> 2219         return list(self.iterall(batch_size=batch_size))
   2220 
   2221     def dict(self, batch_size=None):

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/querybuilder.pyc in iterall(self, batch_size)
   2163         :returns: a generator of lists
   2164         """
-> 2165         query = self.get_query()
   2166 
   2167         for item in self._impl.iterall(query, batch_size, self._attrkeys_as_in_sql_result):

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/querybuilder.pyc in get_query(self)
   2049 
   2050         if need_to_build:
-> 2051             query = self._build()
   2052             self._hash = queryhelp_hash
   2053         else:

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/querybuilder.pyc in _build(self)
   1839                                any(['path' in d.keys() for d in self._projections[edge_tag]]))
   1840                 aliased_edge = connection_func(
-> 1841                     toconnectwith, alias, isouterjoin=isouterjoin, filter_dict=filter_dict, expand_path=expand_path)
   1842             else:
   1843                 aliased_edge = connection_func(toconnectwith, alias, isouterjoin=isouterjoin)

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/querybuilder.pyc in _join_descendants_recursive(self, joined_entity, entity_to_join, isouterjoin, filter_dict, expand_path)
   1393         link2 = aliased(self._impl.Link)
   1394         node1 = aliased(self._impl.Node)
-> 1395         in_recursive_filters = self._build_filters(node1, filter_dict)
   1396 
   1397         selection_walk_list = [

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/querybuilder.pyc in _build_filters(self, alias, filter_spec)
   1307                             column=column,
   1308                             column_name=column_name,
-> 1309                             alias=alias)) for operator, value in filter_operation_dict.items()
   1310                 ]
   1311         return and_(*expressions)

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/implementation/sqlalchemy/querybuilder.pyc in get_filter_expr(self, operator, value, attr_key, is_attribute, alias, column, column_name)
    266                         raise RuntimeError("I need to get the column but do not know the alias and the column name")
    267                     column = self.get_column(column_name, alias)
--> 268                 expr = self.get_filter_expr_from_column(operator, value, column)
    269 
    270         if negation:

/home/aiida/code/aiida/env/sdb2/aiida-core/aiida/orm/implementation/querybuilder.pyc in get_filter_expr_from_column(cls, operator, value, column)
    154 
    155         if not isinstance(column, (Cast, InstrumentedAttribute, QueryableAttribute, Label, ColumnClause)):
--> 156             raise TypeError('column ({}) {} is not a valid column'.format(type(column), column))
    157         database_entity = column
    158         if operator == '==':

TypeError: column (<type 'property'>) <property object at 0x7fc6af02daa0> is not a valid column

The same goes for when projection e.g. project=['pk'].
We should in increasing importance:

  • Raise a more understandable error when user filters or projects on pk
  • Actually allow to filter and project on pk
  • Completely remove the use of id in QueryBuilder
  • Completely ban the use of id anywhere in the code
@ltalirz
Copy link
Member

ltalirz commented Jul 2, 2019

@CasperWA I remember you mentioning this at some point:
so, the decision is actually to get rid of id in favor of pk entirely

@CasperWA
Copy link
Contributor

CasperWA commented Jul 2, 2019

Aw. Is this due to the fact that id is a special keyword in Python, or? Because I much prefer id over pk, but will of course conform to the decisions made :)

@ltalirz
Copy link
Member

ltalirz commented Jul 2, 2019

Is this due to the fact that id is a special keyword in Python

Yes, this is why django uses PK.

@CasperWA
Copy link
Contributor

CasperWA commented Jul 2, 2019

Is this due to the fact that id is a special keyword in Python

Yes, this is why django uses PK.

And also why we do not want too? (@sphuber, @giovannipizzi)

@sphuber
Copy link
Contributor Author

sphuber commented Jul 2, 2019

Exactly. My opinion is that since id is a reserved keyword, using it in our code will effectively always redefine a built-in. This is of course problematic enough to begin with, but will incur a lot of pylint exclude statements as well. This is indeed the reason why Django uses pk instead. Ultimately, I don't think there is a huge difference between using pk or id, the term "primary key" is even a common term in (relational) databases to use for the attribute that uniquely identifies a record. The most important part is that we make a decision and are consistent. Since pk has no downside and id does, I vote we only use pk.

@giovannipizzi
Copy link
Member

I also vote for consistency and for pk over id since we are using it a lot already.
However I don't think this is the reason of the choice of Django - in Django you can use both, and I think the reason is that the pk of a table does not need to be the id autoincrementing integer column

@sphuber
Copy link
Contributor Author

sphuber commented Jul 2, 2019

Yes, you are correct. The pk just automatically pipes through to whatever the primary key is of the model, which can be defined on the model and is by default id. For consistency then, people in Django use pk, because it will work across models, even if they have different primary key names. This also means that calling pk in Django will be a bit slower than id, since it will have to dereference it, but we can always map pk to id in our backend implementation if it is noticeably problematic.

@sphuber
Copy link
Contributor Author

sphuber commented Jun 1, 2024

This is now implemented by #5088

@sphuber sphuber closed this as completed Jun 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants