Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.6] Faster implementation for Collection::mapToDictionary #22774

Merged
merged 4 commits into from
Jan 14, 2018

Conversation

Lapayo
Copy link
Contributor

@Lapayo Lapayo commented Jan 13, 2018

I just ran into huge performance problems when eager loading multiple thousand rows (~20k in my case).

I tracked it down to the Dictionary::mapToDictionary method, which somehow is very slow when dealing with this amount of data.

I rewrote the method without map-reduce and it is much faster now. Let me know what you think!

The following benchmark was taken on an i7-6700 machine. (Benchmark script attached).
benchmap.zip

time for 100 (current): 5.4161071777344E-5
time for 20k (current): 17.208889961243
time for 25k (current): 35.498338937759
time for (new) 100: 2.5948047637939E-5
time for 20k (new): 0.012021064758301
time for 25k (new): 0.016914844512939

@thecrypticace
Copy link
Contributor

Building arrays with reduce is quadratic so this isn't surprising. 👍

I'd suggest using a foreach loop if performance is the ultimate goal here. On a million items it ranges between 4x to 15x faster.

In the typical case, using foreach in mapToDictionary on a million items runs in 0.8s–4.1s for me (in PHP 7.1 and 7.2) whereas using $collection->each runs in 12s–17s.

(All benchmarks on homestead resulted in < 1s for a million items for foreach. On macOS it's ~4s)

@mfn
Copy link
Contributor

mfn commented Jan 13, 2018

I'd suggest using a foreach loop if performance is the ultimate goal here.

Would like to see this too. @Lapayo can you benchmark with plain foreach?

@Lapayo
Copy link
Contributor Author

Lapayo commented Jan 13, 2018

I just did, its indeed a lot faster - depending on the use case (y)
If the plain version is more likely merged I can commit that one. I was not sure if ->each() might be preferred.

time for 20k (plain): 0.0091784229278564
time for 20k (->each): 0.011384341001511
time for 100k (plain): 0.05624255490303
time for 100k (->each): 0.1215734539032
benchplainvseach.zip

@mfn
Copy link
Contributor

mfn commented Jan 13, 2018

@Lapayo very much appreciated!

I can't comment on the likelyhood, I would definitely favor the plain foreach variant.

Collections are by definition "things of many" and thus almost any operation has to loop in some way and it's a hot code path: even the tiniest slow down accumulates on larger sets, as you've demonstrated.

@GrahamCampbell GrahamCampbell changed the title Faster implementation for Collection::mapToDictionary [5.6] Faster implementation for Collection::mapToDictionary Jan 14, 2018
@GrahamCampbell GrahamCampbell changed the title [5.6] Faster implementation for Collection::mapToDictionary [5.7] Faster implementation for Collection::mapToDictionary Jan 14, 2018
@deleugpn
Copy link
Contributor

I wouldn't go as far as say that the second level of improvement (plain foreach over each) would be that important. Imagine 100k items of an actual eloquent record, you'd need a lot of memory available in order to not get a memory exhaustion, which would lead to a chunk, instead.

@taylorotwell taylorotwell changed the base branch from master to 5.6 January 14, 2018 14:56
@taylorotwell taylorotwell merged commit d8b2eb1 into laravel:5.6 Jan 14, 2018
@taylorotwell
Copy link
Member

Used foreach. Thanks.

@GrahamCampbell GrahamCampbell changed the title [5.7] Faster implementation for Collection::mapToDictionary [5.6] Faster implementation for Collection::mapToDictionary Jan 14, 2018
@mfn
Copy link
Contributor

mfn commented Jan 14, 2018

@deleugpn

I wouldn't go as far as say that the second level of improvement (plain foreach over each) would be that important. Imagine 100k items of an actual eloquent record, you'd need a lot of memory available in order to not get a memory exhaustion, which would lead to a chunk, instead.

I see were you coming from. But I think you've to think more outside that.

This is a PR for the base Support\Collection, not Eloquent\Collection. So 100k of non-Eloquent models may will make sense for some, too.
Sure, at some point if performance is such an issue, you'd probably forgo using such a wrapper class.

But my take is: having identifier a hot code path and found a solution which a) is 100% backwards compatible and b) gives so much better performance characteristics is nothing to disagree with; on the contrary.

@gabrielkoerich
Copy link
Contributor

Why not 5.5?

@tnorthcutt
Copy link

+1, would love to see this merged into 5.5 as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants