Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.6] Optimize query builder's pluck() method #23482

Merged
merged 4 commits into from
May 23, 2018

Conversation

franzliedke
Copy link
Contributor

Proposed by @vlakoff in laravel/ideas#98 and yesterday acknowledged by Taylor...

This refactors the pluck method (and, to stay DRY, the get method) of Illuminate\Database\Query\Builder to no longer rely on object_get internally. As reported in the aforementioned issue, this can become a bottleneck when looping over a large number of results.

I've inlined Collection::pluck and then Arr::pluck, subsequently throwing away any overly generic code that is not necessary in this use case - because we know the data type of query results. Also, we don't need the other features of object_get, such as dot notation and default values.

@franzliedke
Copy link
Contributor Author

franzliedke commented Mar 10, 2018

Please note there are currently two failing tests. This is because the test case sets up mocks as if the PDO fetch mode were still $PDO::FETCH_COLUMN. As far as I can tell, the fetch mode is now hard-coded to $PDO::FETCH_CLASS, so technically the test case could be considered incorrect.

What's the status of fetch code? It seems as if it cannot be changed in full-stack Laravel, whereas it can be changed when using illuminate/database as a stand-alone package (via Capsule).

So please tell me which is the correct way forward:

  • Change the tests to return stdClass objects.
  • Change pluck so that it can deal both with arrays and objects.

Even if the second option is preferred, taking care of this change in the tests may be worthwhile as well, as this would make the mocks more realistic (because full-stack Laravel always returns objects).

* @param callable $callback
* @return mixed
*/
protected function onceWithColumns($columns, $callback)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct location for the helper method? Or should it be placed below get()?

@vlakoff
Copy link
Contributor

vlakoff commented Mar 11, 2018

Thank you for your work on this. Much appreciated.

Ah ah yes, fetch mode has been removed, see notably #17557. Still, users can modify the fetch mode, so currently your code would break with "array fetch modes".

Off the top of my head:

  • Support object and array, but check has to be outside the loop. Then possibly 2 code branches, each with its loop.
  • Trim stuff such as "dot nested access" and "wildcards"

@vlakoff
Copy link
Contributor

vlakoff commented Mar 11, 2018

Also, keep in mind the PDO::FETCH_CLASS mode (but actually mapping to a user-defined class, not returning simple stdClass): laravel/ideas#98 (comment).

(as a reminder: at some time, laravel was using PDO::FETCH_CLASS to return stdClass instances. It works, but the proper pick should have been PDO::FETCH_OBJ. I've had fixed that since then.)

foreach ($queryResult as $row) {
$itemValue = $row->$column;

if (is_null($key)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the is_null($key) function call outside of the loop.

(replacing with $key === null would be fine too, but you know the laravel convention...)

@jmarcher
Copy link
Contributor

Can you provide some benchmarks? I believe this kind of changes does not get merged without some kind of benchmarks.

@franzliedke
Copy link
Contributor Author

I will gladly do that once I get some official buy-in. ;)

@franzliedke franzliedke changed the title [5.6] Optimize query builder's pluck() ,method [5.6] Optimize query builder's pluck() method Mar 16, 2018
@franzliedke franzliedke force-pushed the 5.6-optimize-query-pluck branch 2 times, most recently from 07201ac to f6894c0 Compare March 16, 2018 22:52
@franzliedke
Copy link
Contributor Author

Okay, I refactored a bit, tests are working.

Benchmark

Table with 10885 rows, 1000 iterations
Old method: 21.29s
New method: 11.56s

So, almost 2x speedup.

} else {
$results[$row->$key] = $row->$column;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move the is_null() function call outside of the loop:

if (is_null($key)) {
    foreach ($queryResult as $row) {
        $results[] = $row->$column;
    }
} else {
    foreach ($queryResult as $row) {
        $results[$row->$key] = $row->$column;
    }
}

It's a bit longer, but saving thousands of function calls is worth it, and the code is still readable, maybe even a bit better.

} else {
$results[$row[$key]] = $row[$column];
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same is_null() unlooping as in pluckFromObjectColumn().


return tap($callback(), function () use ($original) {
$this->columns = $original;
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this one is "Laravel style", but sorry it's just incredibly painful to understand, especially if you are not used to this voodoo helper. Can't we just stay with the regular and simple lines:

$results = $callback();

$this->columns = $original;

return $results;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. (I'm used to it from Ruby land, where it's far more readable.) I've advocated against the overuse of tap in Laravel in the past myself, whoops. Thanks for the feedback.

@vlakoff
Copy link
Contributor

vlakoff commented Mar 18, 2018

Strong +1 for this.

Sure there is a bit of added code, but it's not that much, and it's still very understandable. The DB pluck() method is noticeably slow, and I think such a performance boost is clearly worth it.

Also, would you rerun your benchmark with the unlooped is_null() calls? To prove the benefit ;)

This allows us to reimplement pluck() without calling get(), while
avoiding the duplication of a lot of code.

As a next step, pluck() can be re-implemented without relying on
the (comparatively expensive) same method on the Collection class.
This prepares for the final step of removing (without replacement)
the expensive parts of its implementation (from which, I suspect,
the code will get shorter again).
Because we don't need all the features of data_get() - such as
dot access - there is no need to use the function, which can
become expensive when doing this with many, many rows in a loop.
@franzliedke franzliedke force-pushed the 5.6-optimize-query-pluck branch from f6894c0 to 5e444b7 Compare March 18, 2018 12:18
@franzliedke
Copy link
Contributor Author

@vlakoff I've taken care of your feedback, thank you!

Updated Benchmark

Table with 10885 rows, 1000 iterations
Old method: 22.13s
New method: 10.71s

@carusogabriel
Copy link
Contributor

carusogabriel commented Mar 18, 2018

@franzliedke @vlakoff $foo === null is faster, and safer, IIRC than is_null($foo) :)

@vlakoff
Copy link
Contributor

vlakoff commented Mar 18, 2018

Comparing this bench with the previous one, the is_null() unlooping increased the speedup from 1.84x to 2.07x. Which is good to take, let's optimize completely.

@vlakoff
Copy link
Contributor

vlakoff commented Mar 18, 2018

@carusogabriel Indeed. If it were up to me, I would definitely use this. But the Laravel codebase uses is_null() everywhere.

@franzliedke
Copy link
Contributor Author

Well, since we're not doing it in a loop, this type of micro-optimization won't have any measurable effect anyway.

@carusogabriel
Copy link
Contributor

@vlakoff Maybe is some topic we can put at laravel/ideas.

@franzliedke Isn't micro, we reduce from 3 opcodes to 1, see:

@laravel laravel deleted a comment from vlakoff Mar 18, 2018
@thecrypticace
Copy link
Contributor

thecrypticace commented Mar 18, 2018

The difference in number of opcodes is because of (what I think is called) constant folding. An optimization that is performed by PHP because null === null is always true.

is_null(foo()) gives similar output to foo() === null. The difference being IS_IDENTICAL versus TYPE_CHECK opcodes. It's highly unlikely that this difference matters at all.

@mfn
Copy link
Contributor

mfn commented Mar 19, 2018

Totally like the PR.

But shouldn't we be cautious and rather target master/5.7 with this?


$result = $callback();

$this->columns = $original;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be wrapped in a try/finally block? We want to make sure we always reset them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wasn't before my refactoring, so I don't want to change this here...

@vlakoff
Copy link
Contributor

vlakoff commented Apr 4, 2018

So… it seems to be ready. Anything remaining to do?

@vlakoff
Copy link
Contributor

vlakoff commented Apr 5, 2018

Replying to mfn:

But shouldn't we be cautious and rather target master/5.7 with this?

The change is supposed to be transparent ;-)

If ever it breaks something, postponing the change to 5.7 wouldn't have made it better. In such a scenario, we would have to fix it asap or revert the change if not quickly fixable. Would be the same for any branch.

edit: Applying it to 5.7, it would have some "beta testing" phase, indeed. But as it's not changing the API, I still think it should be applied the soonest (and fixed the soonest, in worst case scenario).

@jaketoolson
Copy link

Optimizing pluck (on our own fork) on a project resulted in a previous ~2-minute process optimizing to around 20 seconds. Glad to see someone has also encountered this as well and raised attention to it.

@taylorotwell taylorotwell merged commit 5e444b7 into laravel:5.6 May 23, 2018
@taylorotwell
Copy link
Member

Thanks. Sorry for the delay 😄

@franzliedke franzliedke deleted the 5.6-optimize-query-pluck branch May 23, 2018 17:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants