Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making Map.unorderedTraverse stack safe #4463

Merged
merged 7 commits into from
Jul 7, 2023
Merged

Making Map.unorderedTraverse stack safe #4463

merged 7 commits into from
Jul 7, 2023

Conversation

TonioGela
Copy link
Member

Fixes #4461.

The implementation is copied from List.traverse_.

I added just simple tests for the Map implementation, that is the only one that changed.
I'm not sure where to add tests for other unorderedTraverse implementations, they're not worth a TestLaw class IMHO.

@TonioGela
Copy link
Member Author

The implementation calls the Map constructor and uses ++ to concat every couple of maps, so I expect it to produce a lot of GC pressure.
@johnynek should we write a chain-based implementation instead? WDYT?

@johnynek
Copy link
Contributor

You can benchmark if you have time, but I bet the chain based will be faster.

In the case of List for a small to medium list it was faster IIRC to not convert back and fourth from Vector and Chain.

@armanbilge armanbilge added the bug label Jun 20, 2023
@TonioGela
Copy link
Member Author

I noticed that the CI fails for native, I presume it's due to some stack size issue, not stack depth.
In any case, @johnynek I added a couple of benchmarks to the PR, that should test the 2 implementation, via chain and via tree, using 1_000 and 1_000_000 items maps.
WDYT? Are they worth running? I'll leave the laptop running them this night if you think they are :D

@johnynek
Copy link
Contributor

I think they are worth running.

Thanks!

else
G.map(Chain.traverseViaChain(fa.toIndexedSeq) { case (k, a) =>
G.map(f(a))((k, _))
}) { chain => chain.foldLeft(Map.empty[Int, B]) { case (m, (k, b)) => m.updated(k, b) } }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think chain.iterator.toMap would be more efficient potentially since it can use builder internally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable, I'll use it for the bench!

@TonioGela
Copy link
Member Author

I think they are worth running.

Thanks!

I'll update you tomorrow, thanks 👋

@TonioGela
Copy link
Member Author

There was no need to wait a night, the bench took ~45 minutes.

These tests were done on a Apple M2 Pro
Scala Version: 2.13.11
JDK:

openjdk 17 2021-09-14
OpenJDK Runtime Environment Temurin-17+35 (build 17+35)
OpenJDK 64-Bit Server VM Temurin-17+35 (build 17+35, mixed mode)

These are the results:

[info] Benchmark                                                  Mode  Cnt        Score       Error  Units
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaChain1  avgt   25      144.320 ±     0.530  us/op
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaChain2  avgt   25   192889.802 ±  2467.065  us/op
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaTree1   avgt   25     1182.669 ±    14.740  us/op
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaTree2   avgt   25  2687201.422 ± 31730.496  us/op

They show that the unorderedTraverse that uses internally traverseViaChain is ~8 time faster on a 1000 element map and ~14 times faster for a 1_000_000 element map.

I'll use that implementation to fix the stack overflowing bug, as it could possibly solve the Scala Native issue in CI as I expect the Chain and .iterator.toMap to cause less GC pressure.

@TonioGela TonioGela changed the title 4461 Making Map.unorderedTraverse stack safe Jun 20, 2023
Copy link
Contributor

@johnynek johnynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for doing this!

@TonioGela
Copy link
Member Author

TonioGela commented Jun 20, 2023

@armanbilge made me just notice that according to @djspiewak the M series is not great when it comes to bench marks: https://fosstodon.org/@SethTisue/110500223425754529
Apparently they're too optimized (it's like having a huge L3 cache) and as such they'll profoundly differ from the machines the code will tipically run on.
Atm I have no other machine to run the benchmarks on, so it's time for me to ask for some help :(
Can someone run these benchmarks on a non m1/2 machine?

(if we don't find out noone we can try to run them in CI, but the machines are pretty crappy)

@TonioGela
Copy link
Member Author

TonioGela commented Jun 21, 2023

I've managed to run these benchs on an Intel machine using java 11.0.11 2021-04-20 LTS

These are the results:

[info] Benchmark                                                  Mode  Cnt        Score       Error  Units
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaChain1  avgt   25      235.831 ±     6.017  us/op
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaChain2  avgt   25   274436.964 ±  5842.414  us/op
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaTree1   avgt   25     1883.380 ±    27.511  us/op
[info] UnorderedTraverseMapBench.unorderedTraverseTupleViaTree2   avgt   25  3733516.461 ± 23333.342  us/op

The ratios are more or less the same, 8 times faster in one case, 13 in the other

cc @armanbilge

@TonioGela
Copy link
Member Author

any news on this?

@johnynek johnynek merged commit e379f6e into typelevel:main Jul 7, 2023
@johnynek
Copy link
Contributor

johnynek commented Jul 7, 2023

I don't know the rules these days but went ahead and merged. It's been a while, I reviewed, there were great benchmarks, and in the worst case we can amend.

Thank you!

@TonioGela TonioGela deleted the 4461 branch July 9, 2023 21:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

unorderedTraverse throws a Stackoverflow Error
4 participants