Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue with WhereIn (and with eager loading) #26051

Closed
Arkhas opened this issue Oct 10, 2018 · 33 comments
Closed

Performance issue with WhereIn (and with eager loading) #26051

Arkhas opened this issue Oct 10, 2018 · 33 comments

Comments

@Arkhas
Copy link

Arkhas commented Oct 10, 2018

  • Laravel Version: 5.7.9
  • PHP Version: 7.1.21
  • Database Driver & Version: MySQL 5.5.5 -10.2.15-MariaDB

Description:

Doing a whereIn with prepared query take much more time than doing a raw query

Steps To Reproduce:

Create 10000 users

do :

$user_ids = \App\User::pluck('id');
 \App\User::whereIn('id', $user_ids->sort())->get();

this request take 908ms to execute ,

But when doing :

 \DB::table('users')->select('*')->whereRaw("id in ('1', '2', '3', '4', [...],'10000')")->get();

This on is taking only 68ms to execute.

I have made a demo project here : https://github.com/Arkhas/laravel-perf-wherein

(just run php artisan migrate:refresh --seed and visit the homepage (PHP Debug bar is enable))

it seems that it's an issue with prepared queries of php PDO :

https://laracasts.com/discuss/channels/general-discussion/eloquent-is-so-slow?page=2
https://stackoverflow.com/questions/4350718/why-are-certain-types-of-prepared-queries-using-pdo-in-php-with-mysql-slow

when doing eager loading with eloquent, it seems that it use the first synthax and take a lot more time to process

the question here is : Is it required to do prepared queries for eloquent eager loading ? Or is it possible to eager load using the second synthax?

@sisve
Copy link
Contributor

sisve commented Oct 11, 2018

The two queries doesn't produce the same result, one returns Eloquent models, one doesn't.

You can use something like Blackfire to figure out where the time is spent. Is it really the actual database call, and not anything related like escaping values or hydrating models?

@Arkhas
Copy link
Author

Arkhas commented Oct 11, 2018

you can replace \DB::table('users') by \App\User and is still results in lower performances,

and it's just the db query which takes time :

capture d ecran 2018-10-11 a 10 30 25

capture d ecran 2018-10-11 a 10 31 08

the only difference between the 2 queries is that one is using binding while the other is not

@Kyslik
Copy link
Contributor

Kyslik commented Oct 12, 2018

Are you benching this in one request? Doesn't database do some caching? Try changing the order of execution.

@Arkhas
Copy link
Author

Arkhas commented Oct 12, 2018

No it's on 2 different reload (commenting the one I don't need), and same kind of result append when I do this :

$user_ids = \App\User::pluck('id');
$test_1  = \App\User::whereIn('id', $user_ids->sort())->get(); // 1.37s

capture d ecran 2018-10-12 a 11 50 54

$test_2 = \App\User::whereRaw("id in ('" . implode("','", $user_ids->toArray()) . "')")->get(); // 111.2ms

capture d ecran 2018-10-12 a 11 51 52

And for exemple, doing this :

public function user()
    {
        return $this->belongsTo(User::class, 'id', 'id');
    }

then

$test_3 = \App\User::with('user')->get();

the "relation' query is exectuted in 1.44s
capture d ecran 2018-10-12 a 11 50 03

Is it something that can be fix with a better configuration of the mySQL server ?

I have a pretty standard valet installation on my local env, but the same thing in happening on the production server.

PS: I have reflect these tests in this file :

https://github.com/Arkhas/laravel-perf-wherein/blob/master/routes/web.php

@vlakoff
Copy link
Contributor

vlakoff commented Oct 12, 2018

Maybe the bottleneck is not in the whereIn, but in the beforehand pluck and/or sort?

@Arkhas
Copy link
Author

Arkhas commented Oct 12, 2018

Same thing when the array is manually passed in the array

$test_4 = \App\User::whereIn('id', ['1', '2', [...], '10000'])->get()

capture d ecran 2018-10-12 a 15 13 03

And even if it would take more time to process, it should not have any influence on the query execution time

@JayBizzle
Copy link
Contributor

What happens if you run that query directly in Sequel Pro or something (with query cache disabled)?

I have noticed with some of my own queries that Debugbar sometimes says a query has taken much much longer than it ever does when run directly. I always assumed there there was some overhead of building the Collection that was being included in the timings. Perhaps I'm wrong?

@vlakoff
Copy link
Contributor

vlakoff commented Oct 12, 2018

I have similar performance considerations with whereIn in projects of mine, currently I chunk requests (let's say chunks of 500 ids) and merge the results.

I was assuming it was some kind of saturation on the DB side. Didn't suspect using whereRaw would be faster (if this is confirmed).

@Arkhas
Copy link
Author

Arkhas commented Oct 12, 2018

@JayBizzle the difference in Sequel Pro is that you are not using a prepared query if you just copy and paste that query in Sequel PRO, it seems that it is not related to building the collection, but just the prepared query (you can try to see if there is any difference using this synthax in your own code :

$test_2 = \App\User::whereRaw("id in ('" . implode("','", $user_ids->toArray()) . "')")->get(); // 111.2ms

@vlakoff , we are using the same workaround for some big queries, as it seems that managing 10000 relation in 2 queries is less efficient that managing 20 queries of 1000

@FabienLucini
Copy link

I have exactly the same problem !!
It seems to be in the prepared query, when you have to handle thousands of bindings.

@Kyslik
Copy link
Contributor

Kyslik commented Oct 12, 2018

@JayBizzle It seems that debugbar may take a while to load (render in the browser) but its actually pretty accurate:

Tinker

$ids = \App\User::pluck('id');
$m = microtime(true); \App\User::whereIn('id', $ids)->get(); microtime(true)-$m;
>>> 0.89213609695435

And I get same results form the debugbar ±0.05.


Need to re-trace everything here.

@HadrienKulik
Copy link

I have experienced this issue and came to the same conclusion than @Arkhas: it seems that Eloquent is slowing down queries with large whereIn condition by easily a factor of x10 vs. doing them raw =/

@vlakoff
Copy link
Contributor

vlakoff commented Oct 12, 2018

Just to clarify, is the slowdown due to the SQL prepared statement, or to some processing by Laravel? (… or both?)

@staudenmeir
Copy link
Contributor

@vlakoff The slowdown is caused by the prepared statement. You can reproduce the performance gap with a standalone PDO connection.

The implementation is simple, just not very elegant. Of course, we can only use it for integer keys.
An example for HasMany relationships: staudenmeir@3be00a2

@FabienLucini
Copy link

FabienLucini commented Oct 16, 2018

Ineed, I came to the same conclusion : the slowdown is caused by prepared statments.
So, for eager loading we should be able to use rawqueries to improve performances.
There is IMHO no security concerns that justifies using prepared statements when its about eager loading.

@staudenmeir
Copy link
Contributor

@FabienLucini It's only secure when the keys are integers.

@mfn
Copy link
Contributor

mfn commented Oct 16, 2018

But I think the final observation of @FabienLucini is very valuable. My arguments:

  • the majority of use cases uses integers
  • eager loading of relations is void of user input

Seems like a candidate for performance optimization.

@Arkhas
Copy link
Author

Arkhas commented Oct 16, 2018

I confirm that using the method of @staudenmeir , the performance are dramatically improved :

For this relation :

\App\User::with('user')->get();

Original :

capture d ecran 2018-10-16 a 16 24 43

using :

/**
     * Set the constraints for an eager load of the relation.
     *
     * @param  array  $models
     * @return void
     */
    public function addEagerConstraints(array $models)
    {
        // We'll grab the primary key name of the related models since it could be set to
        // a non-standard name and not "id". We will then construct the constraint for
        // our eagerly loading query so it returns the proper models from execution.
        $key = $this->related->getTable().'.'.$this->ownerKey;

        $whereIn = in_array($this->parent->getKeyType(), ['int', 'integer']) ? 'whereInRaw' : 'whereIn';
        $this->query->$whereIn($key, $this->getEagerModelKeys($models));
}

In the BelongsTo class :

capture d ecran 2018-10-16 a 16 31 44

@staudenmeir said it's not very elegant, but what would be a more elegant way of doing that ?

@Arkhas
Copy link
Author

Arkhas commented Oct 17, 2018

Does anyone know who to ping to know if this solution is viable, or if not using prepared request is a deal breaker for a possible PR ?

@staudenmeir
Copy link
Contributor

Taylor doesn't normally respond to pings on GitHub (from what I have seen). You can try to ask him on Discord.

@HDVinnie
Copy link
Contributor

@Arkhas did you ever hear anything?

@Arkhas
Copy link
Author

Arkhas commented Oct 29, 2018

Nope, I will try to ping @themsaid here :p

@JayBizzle
Copy link
Contributor

JayBizzle commented Nov 5, 2018

Just adding a bit more info, this PHP bug is related I think https://bugs.php.net/bug.php?id=53458

@deleugpn
Copy link
Contributor

deleugpn commented Nov 7, 2018

The performance impact here is incredible. Given the high (common) use of integers as foreign keys, this would be a pretty big benefit for most codebase. I know the solution isn't marvellous, but I think it would be worth pursuing.

@JayBizzle
Copy link
Contributor

JayBizzle commented Nov 7, 2018

Is it at all feasible to put any of the above fixes into a package in the interim?

@staudenmeir
Copy link
Contributor

@JayBizzle It's possible, but IMO a change like this would be much more useful in the core.

Since most Laravel sites run on MySQL and most sites use eager loading with integer keys (my assumptions), this would improve the performance on a lot of sites. Even if they aren't eager loading thousands of records at once.

The individual query may only gain a few microseconds/milliseconds, but over all sites combined, this adds up.

@JayBizzle
Copy link
Contributor

@staudenmeir Yes, I agree this being in core should be the goal, was just hoping for a quick fix as we have lots of instances in our app where this problem is very apparent.

Perhaps we should create a PR to start getting feedback and discussing ideas. Might get more attention than in here.

@staudenmeir
Copy link
Contributor

I'll submit a PR.

@JayBizzle
Copy link
Contributor

Awesome, I will keep an eye out for it 👍

@HDVinnie
Copy link
Contributor

HDVinnie commented Nov 7, 2018

Surprised @taylorotwell or any of the core dev team haven't commented on this yet. Looking forward to seeing the PR.

@staudenmeir
Copy link
Contributor

#26434

@deleugpn
Copy link
Contributor

deleugpn commented Nov 8, 2018

It hadn't occurred to me that this is a Mysql only problem until I read the PR. It made me wonder if there's anything we can do at the query compilation time. What if we just stop using prepared statement for int values when compiling the bindings at the MySQL driver?

@staudenmeir
Copy link
Contributor

@deleugpn The only way I see is adjusting Builder::cleanBindings(). But that wouldn't affect all queries and isn't exactly elegant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests