Should nested strategy not be faster than concurrent files strategy for parallelization? #27
-
You state that the nested strategy "is the fastest option", but in my understanding the consurrent file strategy should be at least as fast when some stages are also parallelized. Do I misunderstand this or is this left from a documentation prior to concurrent strategy? Btw., there is a typo under concurrent file stragety vignette, its written as |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
The documentation states:
If I wrote something different somewhere, please let me know. Nested strategy may or may not be faster. Parallelism is a complex topic and the answer is usually: benchmark it. We are currently running some benchmark but right now I do not know if nested strategy brings significant improvement. Basically, you can see the code of for (file in files)
{
# stage 1
for (point in points)
{
}
# stage 2
for (point in points)
{
}
...
} The outer loop is parallelized. Some inner loops are parallelized. My prediction is: no significant improvement. The gain at parallelizing the full pipeline on 4 cores is much more significant than the gain at parallelizing 1 stages on 10 cores. Especially decompressing and reading 4 LAZ files simultaneously in a game changer compared to computing e.g. 10 pixels simultaneously in a rasterization for example. If you find something drastically different from my prediction (in good or bad), please let me know. The stages that are parallelized are: What I can tell is that |
Beta Was this translation helpful? Give feedback.
-
I quoted from https://r-lidar.github.io/lasR/articles/multithreading.html#concurrent-files-strategy, where you wrote:
So if we are using the same number of cores for the outer loop to parallelize on files (e.g. |
Beta Was this translation helpful? Give feedback.
-
Absolutely. In theory at least.
Absolutely, In theory. In practice, it is hard to say and should be benchmarked. In case of As I said, parallelism is a big topic and the good answer is always benchmark it. For |
Beta Was this translation helpful? Give feedback.
-
Absolutely. If 6 cores is faster than 4 and you have enough memory. Which depends on your machine. Again: benchmark it on a subset of your dataset. On a modern computer with 12 or 20 cores, the answer is likely yes. |
Beta Was this translation helpful? Give feedback.
Absolutely. In theory at least.
Absolutely, In theory. In practice, it is hard to say and should be benchmarked. In case of
nested(4,2)
on a 20 cores machine I'd say yes, very likely. In case of(4,2)
on a 10 cores machine, in my opinion it is not obvious at all. 4x2 = 8 cores involved. On a 10 cores machine, there is usually no significant gains after half cores. Also, you must consider the overheads, parallelization has a cost. Then you must consider how synchronized are each cores in the outer loop. Imagine stage 3 is parallelized and you are pr…