Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree query improvements #3443

Merged
merged 14 commits into from
Aug 1, 2022

Conversation

SchrodingersGat
Copy link
Member

@SchrodingersGat SchrodingersGat commented Jul 31, 2022

#3438 focused my attention on the PartCategory and StockLocation list API endpoints.

Currently they are quite inefficient!

Taking the PartCategory list endpoint as an example: /api/part/category/?limit=<x>

Number of Results Number of Queries Time (s)
10 24 0.1
100 203 0.3
500 1000 1.0
1000 1994 1.9
5000 8910 8.6

This is very slow! (The StockLocation endpoint has similar issues)

After fixes applied

Number of Results Number of Queries Time (s)
10 6 0.1
100 6 0.1
500 6 0.25
1000 6 0.45
5000 6 2.5

There are two culprits here causing 1 + N query problems.

Part Count

Each category object has a field "parts" which is a count of the number of parts in the category (or any subcategories)

Solution

  • Use a query annotation (with a tree-based subquery) to annotate the queryset and prevent any additional database lookups.

This has been tested, and works very well - query count has been cut in half!

Pathstring

The pathstring field is calculated "on the fly" for each part category. One DB query is required for each item in the results. For thousands of results, this means thousands of queries!

This currently presents a more difficult challenge to fix!

Options to pursue here:

Caching

We could use the "cache" framework to save and load pathstring values, so we can load from memory instead of the database. This works (and has been tested), reducing the problem down to O(1) instead of O(n).

However the default caching framework can only handle a certain number of keys before it runs out. This solution works up to a point, but is always going to be limited by the maximum number of cache entries allowed by the system.

Annotation

We could (maybe) work out a queryset annotation to fetch the required data from the database and reduce the number of DB hits to a constant number.

However this is a harder problem than any existing queryset annotations. Might be not possible as we need to return multiple results per row, which is not generally possible with django's annotation framework.

Pre-Calculate and Store

We could pre-calculate the pathstring field and store in the database. Whenever the model is saved, re-compute and save the "pathstring" again. As these don't change very often, this would be not too bad.

However we would be limited by the available number of characters in the char field. This might be OK if we decide on a mechanism to handle "very long" path strings, maybe something like:

/some/path/which/is/super/long/....../middle/is/cut/out/and/replaced/with-dots/

- Uses subquery to annotate the part-count for sub-categories
- Huge reduction in number of queries
@SchrodingersGat SchrodingersGat added bug Identifies a bug which needs to be addressed api Relates to the API labels Jul 31, 2022
@SchrodingersGat SchrodingersGat added this to the 0.9.0 milestone Jul 31, 2022
Copy link
Member

@matmair matmair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for now. Prerendering the path might be the best solution.

@SchrodingersGat SchrodingersGat modified the milestones: 0.9.0, 0.8.0 Aug 1, 2022
@SchrodingersGat SchrodingersGat merged commit 175d955 into inventree:master Aug 1, 2022
@SchrodingersGat SchrodingersGat deleted the category-query branch August 1, 2022 03:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Relates to the API bug Identifies a bug which needs to be addressed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants