-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement sort
and argsort
for one dimensional SimpleArray
in C++
#456
base: master
Are you sure you want to change the base?
Conversation
@yungyuc Here's the basic implementation of the sorting functions. Please take a look and give me some feedback when you are available. |
You can first focus on the in-place version of the wrapper for sorting and worry about the out-of-place version that returns an array.
Please review how numpy implements |
In order to make the datatype of return value of SimpleArray::argsort compatible to the wrapped python datatypes, the return value is changed to SimpleArray<uint64_t>. Function argsort is also wrapped in this commit.
@yungyuc Thanks for advice, I've wrapped the |
Please pass all CI runs and add annotations inline. |
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
@@ -585,6 +586,102 @@ class SimpleArray | |||
} | |||
} | |||
|
|||
void sort(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inplace sorting function that currently support only 1-D SimpleArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might consider using the composition design like SimpleArrayMixinCalculators
.
modmesh/cpp/modmesh/buffer/SimpleArray.hpp
Line 122 in b6adfc9
class SimpleArrayMixinCalculators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur. This helper should be put in the calculator mixin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
std::sort(begin(), end()); | ||
} | ||
|
||
SimpleArray<uint64_t> argsort(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function that returns a SimpleArray containing the indices which will sort the array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argsort()
is of the same group as sort()
and should also go to SimpleArrayMixinCalculators
.
In PR we do not add comments for every detail. Please be comprehensive when addressing review comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the function is too long and should be moved outside the class declaration.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
|
||
{ // Return array initialization | ||
uint64_t cnt = 0; | ||
std::for_each(ret.begin(), ret.end(), [&cnt](uint64_t & v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initialize the array to be returned to the ascending indices
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
} | ||
|
||
value_type const * buf = body(); | ||
auto cmp = [buf](uint64_t a, uint64_t b) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comparison function that take the input as indices of the SimpleArray and compare the value of the SimpleArray at the given index.
Use this function as the comparison function for std::sort
can make the return index array sorted by the data stored in the SimpleArray
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
return ret; | ||
} | ||
|
||
void apply_argsort(SimpleArray<uint64_t> const & sorted_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function that takes the return value of argsort
of another SimpleArray and reorder current SimpleArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please point me to the document link to the corresponding API in numpy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of apply_argsort()
has too many lines and need to be taken outside the class declaration, like:
class A {
void helper();
};
A::helper()
{
// long implementation.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please point me to the document link to the corresponding API in numpy?
This function is intend to achieve the functionality of numpy.take_along_axis
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of
apply_argsort()
has too many lines and need to be taken outside the class declaration, like:
What I consider moving the implementation out from the declaration is to move the implementation to SimpleArray.cpp
, but I'm wondering if I can move this out because it is still a template function.
If you mean just move the implementation from the class declaration to somewhere beneath in the SimpleArray.hpp
. Then I have no problem about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you mean just move the implementation from the class declaration to somewhere beneath in the
SimpleArray.hpp
. Then I have no problem about it.
This is what I meant: to move the implementation outside the class declaration but keep it in the header file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please point me to the document link to the corresponding API in numpy?
This function is intend to achieve the functionality of
numpy.take_along_axis
.
What is the rationale to name it otherwise but not take_along_axis
?
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
return; | ||
} | ||
|
||
std::vector<bool> applied_arg(shape()[0], false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A data storage that stores whether the indices in sorted_args
is already applied.
When this array is all true
, the index application is done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to indicate that this only support 1D array. In the same time, please prepare for implementing for multi-dimensional array sorting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
return true; | ||
}; | ||
|
||
auto next = [](std::vector<bool> & vec, ssize_t last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A helper function that returns the next index not yet applied
@yungyuc @ThreeMonth03 please review this PR. |
sort
and argsort
for one dimensional SimpleArray
in C++ (draft)sort
and argsort
for one dimensional SimpleArray
in C++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not have a test case for apply_argsort()
and the time complexity is horrible. If you are not ready to have the function reviewed, remove it from the PR.
- Move the
sort()
helper toSimpleArrayMixinCalculators
. - Add dimension number to the error message.
- Confirm API consistency of
apply_argsort()
to numpy. - The implementation of
apply_argsort()
has too many lines and need to be taken outside the class declaration. - Always use curly braces for blocking.
- Use
std::any_of()
: https://en.cppreference.com/w/cpp/algorithm/all_any_none_of - Prefer pre-increment
++i
instead of post-incrementi++
. They mean differently. -
apply_argsort()
applies sorted index and should be$O(n)$ .
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
@@ -585,6 +586,102 @@ class SimpleArray | |||
} | |||
} | |||
|
|||
void sort(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur. This helper should be put in the calculator mixin.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
{ | ||
if (ndim() != 1) | ||
{ | ||
throw std::runtime_error("SimpleArray: Sorting is only supported in 1D array."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the dimension number in the error message. See the example:
modmesh/cpp/modmesh/buffer/SimpleArray.hpp
Line 306 in 74a3a0f
throw std::runtime_error(Formatter() << "SimpleArray: shape byte count " << nbytes |
Do not capitalize after :
. That is, write SimpleArray: sorting
.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
@@ -585,6 +586,102 @@ class SimpleArray | |||
} | |||
} | |||
|
|||
void sort(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concur.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
return ret; | ||
} | ||
|
||
void apply_argsort(SimpleArray<uint64_t> const & sorted_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please point me to the document link to the corresponding API in numpy?
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
return ret; | ||
} | ||
|
||
void apply_argsort(SimpleArray<uint64_t> const & sorted_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation of apply_argsort()
has too many lines and need to be taken outside the class declaration, like:
class A {
void helper();
};
A::helper()
{
// long implementation.
}
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
return; | ||
} | ||
|
||
std::vector<bool> applied_arg(shape()[0], false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment to indicate that this only support 1D array. In the same time, please prepare for implementing for multi-dimensional array sorting.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
{ | ||
for (auto i : vec) | ||
{ | ||
if (i == false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always use curly braces for blocking, e.g.,:
if (i == false)
{
return false;
}
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
|
||
std::vector<bool> applied_arg(shape()[0], false); | ||
|
||
auto all = [](std::vector<bool> & vec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use std::any_of()
: https://en.cppreference.com/w/cpp/algorithm/all_any_none_of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
|
||
auto next = [](std::vector<bool> & vec, ssize_t last) | ||
{ | ||
for (ssize_t i = last; i < static_cast<ssize_t>(vec.size()); i++) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer pre-increment ++i
instead of post-increment i++
. They mean differently.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
}; | ||
|
||
ssize_t idx = 0; | ||
while (!all(applied_arg)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! This call to all()
makes next()
gives unnecessary big complexity too. Applying index should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that all()
is
However, I consider next()
a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think apply argsort indices should be
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've remove the all()
judgement and use next()
for the only termination condition of the loop. And I think this function then achieve
@yungyuc all the problems you mentioned has been fixed except for the one related to complexity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
argsort()
is of the same group assort()
and should also go toSimpleArrayMixinCalculators
. - Please reword to be like
SimpleArray::take_along_axis() supports only 1D array but the array is N dimension
(replacing N as the variable). -
apply_argsort()
should be$O(n)$ . - Test
take_along_axis()
too, since you implemented it.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
std::sort(begin(), end()); | ||
} | ||
|
||
SimpleArray<uint64_t> argsort(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argsort()
is of the same group as sort()
and should also go to SimpleArrayMixinCalculators
.
In PR we do not add comments for every detail. Please be comprehensive when addressing review comments.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
std::sort(begin(), end()); | ||
} | ||
|
||
SimpleArray<uint64_t> argsort(void) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the function is too long and should be moved outside the class declaration.
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
{ | ||
if (indices.ndim() != 1) | ||
{ | ||
throw std::runtime_error(Formatter() << "SimpleArray: sorting is only supported in 1D array. " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please reword to be like SimpleArray::take_along_axis() supports only 1D array but the array is N dimension
(replacing N as the variable).
cpp/modmesh/buffer/SimpleArray.hpp
Outdated
|
||
std::vector<bool> applied_arg(shape()[0], false); | ||
|
||
auto next = [](std::vector<bool> & vec, ssize_t last) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not use next()
. It prevents you from making
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think although the time complexity of the next()
is O(n), the total time complexiy of void SimpleArray<T>::take_along_axis(SimpleArray<uint64_t> const & indices)
is O(n).
It is because while ((idx = next(applied_arg, idx)) != -1)
only visit every number once from 0 to n.
By the way, here is the analysis of the gpt-o3-mini.
https://chatgpt.com/share/67a1fc66-a350-800a-910b-73f4a845664b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is prefered to use std::vector<char> & vec
or std::bitset<n> vec
instead of std::vector<bool> & vec
. The performance of std::vector<bool> & vec
is awful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree the overall complexity should still be
@KHLee529 , please avoid the lambda. It will make the code clearer and better optimized. The use of lambda also makes it difficult if not impossible for SIMD.
Please also avoid bound-checking as much as possible.
Is there a way to reduce the loop and/or avoid the nesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree the overall complexity should still be O ( n ) , but the split of nested loop with the inner one in the lambda is hard to read.
@KHLee529 , please avoid the lambda. It will make the code clearer and better optimized. The use of lambda also makes it difficult if not impossible for SIMD.
Please also avoid bound-checking as much as possible.
Is there a way to reduce the loop and/or avoid the nesting?
I notice that the space complexity of this function is O(n). If we directly create the new buffer vector<T> buffer
and rearrange the elements in the vector<T> buffer
, the space complexity is also O(n). Additionally, we could make the code clear and straight forward.
@@ -824,6 +824,17 @@ def test_SimpleArray_SimpleArrayPlex_type_switch(self): | |||
self.assertEqual( | |||
str(type(arrayplex_int32_2)), "<class '_modmesh.SimpleArray'>") | |||
|
|||
def test_sort(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test take_along_axis()
too, since you implemented it.
@yungyuc Problems mentioned have been fixed. Please review the modification. |
Description
Basic sorting functions,
sort
andargsort
, requested in issue #435 are implemented in this PR.Current Stage
SimpleArray::sort
This C++ function is the inplace sorting function for
SimpleArray
. It currently supports only 1 dimensional array.It basically just wrap the
std::sort
from C++ STL with dimension check.SimpleArray::argsort
This is the function that returns an array in which the indices are stored in the order that the array is sorted.
For example, if the array is
arr = [1, 9, 7, 3, 5]
, the returned array will be[0, 3, 4, 2, 1]
indicating thatarr[0]->arr[3]->arr[4]->arr[2]->arr[1]
is the sorted order ofarr
.SimpleArray::apply_argsort
This function is to apply the result that returned by
argsort
of anotherSimpleArray
to make it sorted based on another array.A concrete example is shown below.
Current Problem
Option[SimpleArray]
in C++. as the prototype ofsort
are intended to be the following as mentioned in issue Develop helpers for TimeSeriesDataFrame to sort data and reorder data #435?argsort
be like?TimeSeriesDataFrame.reorder
be like?