Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

whereNotNull check in relations query #46909

Closed
mihy opened this issue Apr 28, 2023 · 7 comments
Closed

whereNotNull check in relations query #46909

mihy opened this issue Apr 28, 2023 · 7 comments

Comments

@mihy
Copy link

mihy commented Apr 28, 2023

Laravel Version

10.x

PHP Version

8.x

Database Driver & Version

PostgreSQL 13.8

Description

This has been a problem for a long time. As we can find, there are multiple requests for that:

#21308

#36905

But always the answer is that it is not a problem. I would not agree to that.

The problem arises when you want to get the count of relationship items in a table with 500K records.

To get a count of HasMany relationship - Laravel creates SQL:

select count(*) as aggregate from "transactions" where "transactions"."user_id" = 1 and "transactions"."id" is not null

Laravel adds

and "transactions"."id" is not null

because of Illuminate\Database\Eloquent\Relations\HasOneOrMany line 87 in addConstraints() function.

And so, in my case, I have 0,5M records and for comparing, lets look at 2 queries:

A. select count(*) as aggregate from "transactions" where "transactions"."user_id" = 1 and "transactions"."id" is not null
Runs for around 10s

B. select count(*) as aggregate from "transactions" where "transactions"."user_id" = 1
Runs for around 0.5s

So the difference is 20 times.

If there is necessary to keep this whereNotNull as default, let it be, but please let's find a way how we can disable this query for specific cases, when it is important to optimize queries. Any ideas?

Steps To Reproduce

for example - see issue #21308

@cosmastech
Copy link
Contributor

cosmastech commented Apr 28, 2023

The problem is that the constraints are applied when relation is instantiated.

There could be a static method like doNotApplyNullCheck() (similar to Relation::noConstraints()) that would not add this whereNotNull() and you could wrap your query in something like this like:

$transactionsCount = HasOneOrMany::doNotApplyNullCheck(fn() => $user->transactions()->count());

I would rather see a maintainer say "we would accept a solution like this" rather than push a PR that gets rejected. I'll let @driesvints chime in on that.

@driesvints
Copy link
Member

I follow @taylorotwell on this one that changing the current behavior is not desirable for the risk of such a large BC break. An alternative may be explored through a PR but it's Taylor who will decide on that, not me. If anyone could send in a PR with an alternative solution to this issue then he can have a look. Thanks

@cosmastech
Copy link
Contributor

I looked into adding this as a global scope so it could be removed by the client code, but it would break the expectation that $model->someRelationship()->withoutGlobalScopes() is removing model related scopes only, as opposed to something that has already been added to the query for many version of Laravel.

Mostly, I don't think the juice is worth the squeeze on this one. My suspicion is that almost anything that would be submitted to rectify this issue would be marked as a breaking change or suggested to build into your own package. 🤷

You could always write:

$count = null;
Relation::noConstraints(function() use ($user, &$count) {
  $count = $user->transactions()->where('user_id', $user->id)->count();
});
// or
Transactions::query()->where('user_id', $user->id)->count();

Though I'm quite amazed that your DB query is really that slow? Can you not add an index on transactions.id?

@mihy
Copy link
Author

mihy commented Apr 29, 2023

Though I'm quite amazed that your DB query is really that slow? Can you not add an index on transactions.id?

As this is FK PostgreSQL already uses it as an index. On my local env, I don't see a big difference, but on the server (cloud) I get to notice one. The problem is not in resources, but in principle - query should be optimal and as we can see - it is not. I am using Laravel for almost 8 years already and I don't find any reason there should be this additional query.

And reason I find out that was - I am using Nova (also a product of @taylorotwell ) and when you open one resource you can list all related resources as well (page of one user - and lists of transactions). and enabling debugbar you can see all queries created by this page - and there it was. Very slow and strange query.

@trip-somers
Copy link

FWIW, this is a major complication for the jenssegers/laravel-mongodb package, too. In MongoDB, the extra NOT NULL check results in a full collection scan, completely negating any index you might have in place. I think I am going to explore a trait that will replace some part of the query builder for hasMany() relationships.

@trip-somers
Copy link

trip-somers commented Jul 25, 2023

I will overload the abstract HasOneOrMany class to replace the addConstraints() method with the following:

    /**
     * Set the base constraints on the relation query.
     *
     * @return void
     */
    public function addConstraints()
    {
        if (static::$constraints) {
            $query = $this->getRelationQuery();

            $query->where($this->foreignKey, '=', $this->getParentKey());

            if ($this->getParentKey() === null) {
                $query->whereNotNull($this->foreignKey);
            }
        }
    }

... then I'll overload HasOne and HasMany to use this new HasOneOrMany extension.

... then I'll need to overload the HasRelations trait to use the new HasOne and HasMany classes.

... then I'll need to use that new trait in all of my models.

That should do it. Pretty silly, though.

Maybe someone can just add that little if-clause for 11.x...

@trip-somers
Copy link

Another alternative would be to just not run the query in the first place if the parent model hasn't been stored in the database yet. That would probably make the most sense. It's always going to return an empty set anyway, so why even run it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants