Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[8.x] Improve one-of-many performance #37451

Merged
merged 8 commits into from
May 21, 2021
Merged

[8.x] Improve one-of-many performance #37451

merged 8 commits into from
May 21, 2021

Conversation

cbl
Copy link
Contributor

@cbl cbl commented May 21, 2021

Background

Eager loading one-of-many relationships can become slow for big tables.

Solution

When eager loading, the related models are loaded using WHERE foreign_id in (1,2,3,...), currently this constraint is applied to the parent query in one of many relationships:

SELECT *
FROM `logins`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM logins
    GROUP BY logins.user_id
) AS latest_login 
ON latest_login.id = logins.id
WHERE user_id in (1,2,3,4,5) # <---

This means that the subselect query gets MAX(id) rows for every group in the table, not only the required ones.

This can be improved by adding the constraint to the subquery:

SELECT *
FROM `logins`
INNER JOIN (
    SELECT MAX(id) AS id
    FROM logins
    WHERE user_id in (1,2,3,4,5) # <---
    GROUP BY logins.user_id
) AS latest_login 
ON latest_login.id = logins.id
See the `EXPLAIN ANALYZE` results for both queries for more information...

Results for 4,000 users and 100,000 logins.

Filter rows on parent query:

-> Nested loop inner join  (cost=50420.32 rows=500500) (actual time=20.806..21.036 rows=5 loops=1)
    -> Index range scan on logins using logins_user_id_index, with index condition: (logins.user_id in (1,2,3,4,5))  (cost=57.51 rows=125) (actual time=0.048..0.190 rows=125 loops=1)
    -> Index lookup on latest_login using <auto_key0> (id=logins.id)  (actual time=0.001..0.001 rows=0 loops=125)
        -> Materialize  (cost=1650.35..1650.35 rows=4004) (actual time=0.167..0.167 rows=0 loops=125)
            -> Index range scan on logins using index_for_group_by(logins_user_id_index)  (cost=1249.95 rows=4004) (actual time=0.012..17.431 rows=4000 loops=1)

Filter rows on subquery:

-> Nested loop inner join  (cost=60.31 rows=125) (actual time=0.121..0.130 rows=5 loops=1)
    -> Filter: (latest_login.id is not null)  (cost=0.13..16.56 rows=125) (actual time=0.113..0.115 rows=5 loops=1)
        -> Table scan on latest_login  (cost=2.50..2.50 rows=0) (actual time=0.000..0.001 rows=5 loops=1)
            -> Materialize  (cost=2.50..2.50 rows=0) (actual time=0.112..0.114 rows=5 loops=1)
                -> Group aggregate: max(logins.id)  (actual time=0.041..0.100 rows=5 loops=1)
                    -> Filter: (logins.user_id in (1,2,3,4,5))  (cost=25.32 rows=125) (actual time=0.021..0.082 rows=125 loops=1)
                        -> Index range scan on logins using logins_user_id_index  (cost=25.32 rows=125) (actual time=0.019..0.063 rows=125 loops=1)
    -> Single-row index lookup on logins using PRIMARY (id=latest_login.id)  (cost=0.25 rows=1) (actual time=0.002..0.002 rows=1 loops=5)

How It Works

The subquery is bound to the class property $oneOfManySubQuery, the constraints to restrict rows by the foreign_key when eager loading or retrieving a single result will be added to this sub query builder. The subquery will be added to the inner join using beforeQuery introduced in #37431

More details...

A new public method is added to retrieve the one of many subquery builder instance:

The getRestrictionQuery method decides which query foreign key constraints should be added to:

protected function getRestrictionQuery()
{
return $this->query;
}

protected function getRestrictionQuery()
{
return $this->isOneOfMany()
? $this->oneOfManySubQuery
: $this->query;
}

ping #37362

@taylorotwell
Copy link
Member

So, won't addConstraints be called twice for these queries?

@cbl
Copy link
Contributor Author

cbl commented May 21, 2021

@taylorotwell It is called by the constructor, however at that point it doesnt know that the relationship is one-of-many so it must be called again when ofMany is called, so the constraints will be added to the sub query as well.

@taylorotwell
Copy link
Member

So are the constraints added to both the sub query and the "main" query?

@cbl
Copy link
Contributor Author

cbl commented May 21, 2021

@taylorotwell yes, but that does not effect the performance or the results at all, so not necessary to remove the constraint from the parent builder.

@taylorotwell taylorotwell merged commit d6831e9 into laravel:8.x May 21, 2021
* @param \Closure|string|null $column
* @param string|null $relation
* @param string|null $column
* @param string|\Closure|null $aggregate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This order should not have changed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I guessed that types are ordered by probability. Just noticed the laravel docs show this exact example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants