-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: sort: Stable with better asymptotic time complexity #25137
Comments
What are the effects on memory allocation? That is, is any extra memory allocated during the sort? |
There is an array of 5 * 256 uint8 (but the constant is tunable if needed) and a logarithmic amount of stack space will be used if there are calls to symMerge or quickSort because they are recursive (those two exist in sort.go already in master). Neither of those are essential to the algorithm, but were added as optimizations. |
Why somebody will need it over the standard library ? It is better in general random case ? Numbers please. |
Yeah, look under "Benchmark results" in my first post here, I do not know how it could be missed. For more details see the linked CL at Go's Gerrit. EDIT: sorry, I failed to realize that not everybody is familiar with sort_test.go, the benchmark results posted here (produced by func bench) are from sorting pseudorandom data. |
It is interesting to note that with gcc 7.3.1 my code beats the libstdc++ stable_sort when limited to sublinear space/memory usage starting from about array size 1e8! I sort of ported func bench from sort_test.go to C++ and when compiled with (To clarify, both func bench and my C++ "port" sort seven arrays filled with pseudorandom keys and sizes ranging from 1e8-3 to 1e8+3 inclusive. The durations are the sums from sorting all seven arrays.) I guess that means libstdc++ also uses an O(n*log(n)*log(n)) algorithm ... NB: when compiled with compiler optimizations turned on, the C++ code is 4.7 times faster than when they are turned off (which is the default); thus much larger array sizes would be needed for my Go code to beat that. See the C++ code here: https://github.com/nsajko/cxxStableSortBench |
Is this the same as the suggested algorithm in #7610? |
Yes and no. #7610 proposes using the 2008 merging algorithm by Kim and Kutzner which is the basis of my code. On the other hand #7610 then links to Wikisort, which is not the same as my sort; and in fact uses a more than slightly modified version of the merging algorithm from Kim and Kutzner. |
I'm copying from the CL description:
Having an asymptotically better stable sort seems like a good thing to have. At the same time there's significant amounts non-trivial of new code (~1000 lines + generated code) that needs thorough vetting. @nsajko Is there anyone besides you that could help validate/review your implementation and confirm its matching the above paper? Ideally we need another expert in sorting algorithms. Perhaps @vdobler ? |
Sadly, I do not know anybody who could help. It would certainly be appreciated if @vdobler or anybody else could review the code. |
Ok. The complexity of the algorithm looks good, adding papers to that increase the confidence that you are doing an great job, but the amount of code added really worth it ? |
I'll try to review it, but it is really difficult:
The first two points might impose a fundamental problem: The current code of https://go-review.googlesource.com/c/go/+/101415/41 |
@vdobler Thank you for offering to review! One option to make progress, and perhaps starting with a straight algorithm that can be more easily reviewed and vetted as correct, would be to have this as a second implementation to what we have now, with an appropriately documented (constant) flag to enable/disable. This would permit checking in code that is disabled when checked in, but can be reviewed and tested nevertheless on individual machines simply by enabling the flag. Eventually, we will be able to switch the flag. But before we go there I like to bring this up in the proposal review committee and solicit other opinions. It's not clear to me that this should be a proposal in the first place, for one. Either way, this is certainly not for 1.11 (maybe for 1.12) so there is quite a bit of time here. |
Thanks for your criticisms @vdobler, I will try to address just some of your points for now. Before addressing your points, I just realized that I seemingly failed to mention at all one difference between the Kim&Kutzner paper and my code: K&K put the BDS and MIB on low-index edges of arrays, while I put them on high-index edges, and with that also come other changes like (in func merge) rolling the subblocks of the high-index block instead of the low-index block and a lot of index variables pointing to high-index edges where they point to low-index edges in the paper. I will document that in the next PS.
Do you mean the "Local merges" part in func merge or the entirety of func merge?
Presumably you mean the "outOfInputData" MIB and BDS which cause func merge to take two Interface parameters?
In general symMerge is (for sorted arrays u and v and m == |u| == |v|) O(m * log(m)); while Hwang&Lin (the version based on rotations, without a workspace buffer) is O(m * m) in general, with the "extended" version described in the paper being O(λ * m) (see Lemma 3 in the paper), where λ is the number of distinct elements in one of u or v; but AFAIU λ equals m-1 worst case when we use the algorithm for local merges (in func merge), making it O(m * m). O(m * m) is asymptotically greater than O(m * log(m)), thus using symMerge instead of Hwang and Lin does not invalidate the complexity analysis. I hope I did not get something wrong. (Sidenote: please take a look at Theorem 1 in the paper. It proves that the merge takes O(n) assignments, but the reasoning for the case where the rotation based variant of Hwang&Lin should be used for local merges is left to the reader to prove in the end, with a pointer to Lemma 3. To be honest, it seems to me that the proof does not even work for that case, that is that the merge is O(n * sqrt(n)) in that case instead of O(n). So PTAL at the end of Theorem 1.)
Actually, I think the outOfInputData-buffer optimization should not impact the performance of sorting large arrays (like 1e8). But I do not think removing that would simplify the code very much. |
Why? Is there a technical reason? The algorithm is complex and not widely known
The whole and any part of it. Let me explain what I would consider reviewable:
Yes and every comment of the form "as an optimization..." We need to agree on nomenclature. I think your comment is based on the specialization
While reading the paper and your code I thought it might make the code |
@griesemer That seems like a very reasonable plan. |
The natural way to implement a merge sort is to start from the low indices, possibly leaving an undersized block at the high-index end. The reason is to merge this undersized block more efficiently.
func merge only implements the steps 3 (Block rearrangement) and 4 (Local merges), with appropriate code blocks noted with comments. I could separate most of the code of func merge into two separate functions if you really want it, but I think it is simpler like it is now, because the integer variables can be used all through the code, instead of having to recompute them.
But the paper also uses integer variables ...
But that is what we already have in my code: outOfInputDataMIBuffer implements Interface and func merge does not ever know if aux Interface is internal MIB and BDS, or the outOfInputData variants. |
That's a fine explanation and a valid reason. Now this must find i's way into For the rest: It seems as I am totally inapt in explaining the problem with If we are following Robert's idea of iteratively building up the new algorithm
Only partly: aux cannot live alone and does not abstract the whole |
Please continue with codereview here: nsajko/go#1 Also, all issues on that repo will be relevant: https://github.com/nsajko/go/issues |
Putting on hold until the code has been reviewed and cleaned up. Part of the decision here will be informed by how much complexity we are taking on, so it makes sense to wait to evaluate that until the code is as clean as it can be made. |
It's been a couple years and the CL has not been updated to try to clean it up. It's an enormous amount of code and in general there aren't many simple algorithms for this. It seems like we should move this to likely decline. |
Sorry for keeping this open. Yes, the code could be cleaned up, but there were other issues too. |
The proposed change does not add any new API and is backwards compatible.
The current implementation of Stable is very simple, but not satisfactorily performant and executes in O(n*log(n)*log(n)) time.
I propose implementing Stable using an O(n*log(n)) algorithm which is, though, much more complex. See https://go-review.googlesource.com/c/go/+/101415
Code complexity: the following functions with noted cyclomatic complexities are added:
Performance:
Benchmark results using func bench (it sorts pseudorandom data) from sort_test.go (benchmark names include input array size):
The code in master is faster for input arrays with very many identical elements, but those are fast to sort anyways (in both the old and new code they are orders of magnitude faster to sort than random data).
The text was updated successfully, but these errors were encountered: