-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python buffer objects can have negative strides. #782
Conversation
Indeed, Eigen doesn't (but might someday). For pybind's Eigen support, it should make a copy when the reference type is non mutable (e.g. Making it work should be pretty easy: you'd just need to add a couple of "return false;" in |
Also take a look at the travis-ci failures: clang wants an explicit (ssize_t) cast on the shape in |
@jagerman, thanks so much for your pointers. This change was not as trivial as that, but it's working correctly now. I did not change |
Needing to do two copies is definitely undesirable. I think in that case we are better off doing an element-by-element copy, bypassing the Eigen::Map. I think this will work: if (fits.negativestrides) {
// Eigen does not support negative strides, so we need to make a copy here with normal strides.
// TODO: when Eigen bug #747 is fixed, remove this if case, always execute the else part.
// http://eigen.tuxfamily.org/bz/show_bug.cgi?id=747
value = Type(fits.rows, fits.cols);
if (dims == 1) {
auto input = buf.unchecked_reference<1>();
ssize_t i = 0;
for (EigenIndex c = 0; c < fits.cols; c++) for (EigenIndex r = 0; r < fits.rows; r++)
value(r, c) = input(i++);
}
else {
auto input = buf.unchecked_reference<2>();
for (EigenIndex c = 0; c < fits.cols; c++) for (EigenIndex r = 0; r < fits.rows; r++)
value(r, c) = input(r, c);
}
} else {
value = Eigen::Map<const Type, 0, EigenDStride>(buf.data(), fits.rows, fits.cols, fits.stride);
} |
Sorry, I don't understand. Then you're still doing a second copy. You are replacing copy #2 by a different copy. What we'd want to do is avoid the first call to Actually, coming to think of it, if COPY 1 happens, then the strides will not be negative, no? If so, COPY 2 will not happen. COPY 2 can only happen if the first bool load(handle src, bool) {
auto buf = array_t<Scalar>::ensure(src); // *** COPY 1 ***
if (!buf)
return false;
auto dims = buf.ndim();
if (dims < 1 || dims > 2)
return false;
auto fits = props::conformable(buf);
if (!fits)
return false; // Non-comformable vector/matrix types
if (fits.negativestrides) {
// Eigen does not support negative strides, so we need to make a copy here with normal strides.
// TODO: when Eigen bug #747 is fixed, remove this if case, always execute the else part.
// http://eigen.tuxfamily.org/bz/show_bug.cgi?id=747
auto buf2 = array_t<Scalar,array::forcecast || array::f_style>::ensure(src); // *** COPY 2 ***
if (!buf2)
return false;
// not checking sizes, we already did that
fits = props::conformable(buf2);
value = Eigen::Map<const Type, 0, EigenDStride>(buf2.data(), fits.rows, fits.cols, fits.stride);
} else {
value = Eigen::Map<const Type, 0, EigenDStride>(buf.data(), fits.rows, fits.cols, fits.stride);
}
return true;
} |
I see the compile bots don't understand |
COPY 1 generally only copies if type conversion is needed (e.g. you pass a numpy array of ints to an Eigen matrix of floats), or when what you pass in isn't an array at all (e.g. a list). But if it happens, I believe you'll get something with default (row major) strides, which we can copy via the Eigen map. (There are still two copies, but the first is also doing a conversion). |
Re assert testing: lines 11-13 of test_buffers.py has an example (there are many others through the test suite). |
Re: copying, instead of using Eigen to copy out of numpy, we could add something to tell numpy to copy into a destination, which would avoid a copy in some cases, and avoid the need for the special case for negative strides. |
(I'm not asking you to do that, just thinking out loud) |
|
Right, I was a bit imprecise: it's the assignment of the Eigen::Map to |
I see what you mean now. It's a matrix that is passed to the called function, not the map, so a copy needs to be made. So you are suggesting to create the matrix, give Python a buffer view of it, and ask python to copy the data into it? That sounds very reasonable. I don't think I'll have time to implement that, sorry. |
I just filed a PR against your branch that makes numpy do the conversion. (I've never filed a PR against a PR before--I'm hoping that if you merge it, it'll automatically show up here). |
Thank you so much, that looks pretty simple, but it would have taken me a few days to figure out how to do it. :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few (very superficial) comments.
docs/advanced/pycpp/numpy.rst
Outdated
@@ -41,8 +41,8 @@ completely avoid copy operations with Python expressions like | |||
py::format_descriptor<float>::format(), /* Python struct-style format descriptor */ | |||
2, /* Number of dimensions */ | |||
{ m.rows(), m.cols() }, /* Buffer dimensions */ | |||
{ sizeof(float) * m.rows(), /* Strides (in bytes) for each index */ | |||
sizeof(float) } | |||
{ (ssize_t)( sizeof(float) * m.rows() ),/* Strides (in bytes) for each index */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange indentation: the convention used elsewhere is (ssize_t) (sizeof(float) * m.rows())
docs/advanced/pycpp/numpy.rst
Outdated
@@ -121,8 +121,8 @@ as follows: | |||
{ (size_t) m.rows(), | |||
(size_t) m.cols() }, | |||
/* Strides (in bytes) for each index */ | |||
{ sizeof(Scalar) * (rowMajor ? m.cols() : 1), | |||
sizeof(Scalar) * (rowMajor ? 1 : m.rows()) } | |||
{ (ssize_t)( sizeof(Scalar) * (rowMajor ? m.cols() : 1) ), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
include/pybind11/eigen.h
Outdated
if (!convert && !isinstance<array_t<Scalar>>(src)) | ||
return false; | ||
|
||
// Coerce into an array, but don't do type conversion yet; the copy below handle it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
handles it
Thanks, I'll fix these things tonight. |
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterators instead of requiring a vector. On its own, this adds quite a few templated constructors which isn't all that desirable, but it's needed by PR pybind#782 to preserve backwards compatibility for previous pybind11 versions that accept a `std::vector<size_t>` for strides. Given a choice between duplicating all the stride-taking constructors (accepting both a ssize_t and size_t vector) and making the whole interface more general, I think this is a better approach. I also seem to recall some discussion a few months ago about wanting something like this, but I can't find the issue/PR where it was mentioned.
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterators instead of requiring a vector. On its own, this adds quite a few templated constructors which isn't all that desirable, but it's needed by PR pybind#782 to preserve backwards compatibility for previous pybind11 versions that accept a `std::vector<size_t>` for strides. Given a choice between duplicating all the stride-taking constructors (accepting both a ssize_t and size_t vector) and making the whole interface more general, I think this is a better approach. I also seem to recall some discussion a few months ago about wanting something like this, but I can't find the issue/PR where it was mentioned.
There's another issue here that needs addressing: the current PR is breaking backwards compatibility for I've submitted PR #788 which should address this. It's going to conflict with this PR (because it replaces all the |
(I don't mind fixing the conflicts in #788 if we want to merge this one first--but I think we need both commits to go more or less together). |
That's a really good point. The other public interface changes are the return type of I don't think it will be a huge amount of work to sort out the conflicts, so I wouldn't mind having to fix this PR if you merge yours first. Whatever you prefer. |
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterators instead of requiring a vector. This is primarily needed by PR pybind#782 (which makes strides signed to properly support negative strides), but also needs to preserve backwards compatibility with 2.1 and earlier which accepts the strides parameter as a vector of size_t's. Rather than adding nearly duplicate constructors for each stride-taking constructor, it seems nicer to simply allow any type of container (or iterator pairs). This adds iterator pair constructors, and also adds a `detail::any_container` class that handles implicit conversion of arbitrary containers into a vector of the desired type.
The return type of The input is more of an issue because forcing people to change the std::vector<size_t> str{{ 1, 2, 3 }}; is both likely to be more common, and isn't a change that causes any harm, so it's harder to justify breaking the API unnecessarily. Besides all that, it felt a bit restrictive to me to require a specific vector type for input: if my strides were already in a C-array or |
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterators instead of requiring a vector. This is primarily needed by PR pybind#782 (which makes strides signed to properly support negative strides), but also needs to preserve backwards compatibility with 2.1 and earlier which accepts the strides parameter as a vector of size_t's. Rather than adding nearly duplicate constructors for each stride-taking constructor, it seems nicer to simply allow any type of container (or iterator pairs). This adds iterator pair constructors, and also adds a `detail::any_container` class that handles implicit conversion of arbitrary containers into a vector of the desired type.
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterators instead of requiring a vector. This is primarily needed by PR pybind#782 (which makes strides signed to properly support negative strides), but also needs to preserve backwards compatibility with 2.1 and earlier which accepts the strides parameter as a vector of size_t's. Rather than adding nearly duplicate constructors for each stride-taking constructor, it seems nicer to simply allow any type of container (or iterator pairs). This adds iterator pair constructors, and also adds a `detail::any_container` class that handles implicit conversion of arbitrary containers into a vector of the desired type.
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterators instead of requiring a vector. This is primarily needed by PR pybind#782 (which makes strides signed to properly support negative strides), but also needs to preserve backwards compatibility with 2.1 and earlier which accepts the strides parameter as a vector of size_t's. Rather than adding nearly duplicate constructors for each stride-taking constructor, it seems nicer to simply allow any type of container (or iterator pairs). This adds iterator pair constructors, and also adds a `detail::any_container` class that handles implicit conversion of arbitrary containers into a vector of the desired type.
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterators instead of requiring a vector. This is primarily needed by PR pybind#782 (which makes strides signed to properly support negative strides), but also needs to preserve backwards compatibility with 2.1 and earlier which accepts the strides parameter as a vector of size_t's. Rather than adding nearly duplicate constructors for each stride-taking constructor, it seems nicer to simply allow any type of container (or iterator pairs). This adds iterator pair constructors, and also adds a `detail::any_container` class that handles implicit conversion of arbitrary containers into a vector of the desired type.
Any thoughts on fully committing to signed types for all buffer/array values? (itemsize, ndim, shape) Having a mix of signed/unsigned can be accident-prone. E.g. if a user omits a cast while multiplying a negative stride with an unsigned (This may, however, raise additional backward compatibility issues.) |
This adds support for constructing `buffer_info` and `array`s using arbitrary containers or iterator pairs instead of requiring a vector. This is primarily needed by PR #782 (which makes strides signed to properly support negative strides, and will likely also make shape and itemsize to avoid mixed integer issues), but also needs to preserve backwards compatibility with 2.1 and earlier which accepts the strides parameter as a vector of size_t's. Rather than adding nearly duplicate constructors for each stride-taking constructor, it seems nicer to simply allow any type of container (or iterator pairs). This works by replacing the existing vector arguments with a new `detail::any_container` class that handles implicit conversion of arbitrary containers into a vector of the desired type. It can also be explicitly instantiated with a pair of iterators (e.g. by passing {begin, end} instead of the container).
#788 is merged (sorry for the conflicts!) |
Nice! It was pretty easy to fix the conflicts. Do you need me to rebase and/or squish commits? |
docs/advanced/pycpp/numpy.rst
Outdated
{ sizeof(float) * m.rows(), /* Strides (in bytes) for each index */ | ||
sizeof(float) } | ||
{ (ssize_t)(sizeof(float) * m.rows()), /* Strides (in bytes) for each index */ | ||
(ssize_t)(sizeof(float)) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The (ssize_t)
cast shouldn't be needed here anymore.
docs/advanced/pycpp/numpy.rst
Outdated
{ sizeof(Scalar) * (rowMajor ? m.cols() : 1), | ||
sizeof(Scalar) * (rowMajor ? 1 : m.rows()) } | ||
{ (ssize_t)(sizeof(Scalar) * (rowMajor ? m.cols() : 1)), | ||
(ssize_t)(sizeof(Scalar) * (rowMajor ? 1 : m.rows())) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or here.
include/pybind11/buffer_info.h
Outdated
@@ -36,11 +36,11 @@ struct buffer_info { | |||
} | |||
|
|||
buffer_info(void *ptr, size_t itemsize, const std::string &format, size_t size) | |||
: buffer_info(ptr, itemsize, format, 1, size, itemsize) { } | |||
: buffer_info(ptr, itemsize, format, 1, size, (ssize_t) itemsize) { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... or here (and so on).
A rebase + squash into fewer commits will be needed eventually, but it can probably wait until it's ready to merge. I think this looks pretty good now: the one outstanding bit is whether we should apply the same treatment to |
It is probably better to make it all the same type. I'm not sure how much time that will take, I'll try to do it tomorrow or Monday. I'm afraid it'll require adding tests for negativity in multiple places, and it'll be easy to miss those places... |
I have made |
This looks great! (I haven't yet gone through everything, but I think this is getting close). @wjakob @dean0x7d @aldanor thoughts? I also built this against the 2.1.1 test suite, just to see what would happen, and it (almost*) passed, albeit with a slew of comparison between signed and unsigned warnings from where the tests are using (* - almost in that there were two acceptable failures: the vectorize test that directly calls into py::detail::broadcast didn't compile (it needed types changed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@crisluengo This is great. Awesome effort going through all the signedness changes -- there were quite a few! I've left only very minor comments.
Regarding backward compatibility, it looks like any_container
resolves the majority of the pain, so the only point where issues could arise would be by accessing the raw attributes. E.g., someone using the raw strides
with -Wconversion -Werror
but IMO that's a justified breaking error since strides can be negative and it's more dangerous not to take it into account.
include/pybind11/class_support.h
Outdated
@@ -349,7 +349,7 @@ inline void enable_dynamic_attributes(PyHeapTypeObject *heap_type) { | |||
#endif | |||
type->tp_flags |= Py_TPFLAGS_HAVE_GC; | |||
type->tp_dictoffset = type->tp_basicsize; // place dict at the end | |||
type->tp_basicsize += sizeof(PyObject *); // and allocate enough space for it | |||
type->tp_basicsize += (Py_ssize_t)sizeof(PyObject *); // and allocate enough space for it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think most uses of Py_ssize_t
in pybind11 are historic and ssize_t
is preferable now for consistency. (Although, I don't think this line was triggering any warnings even without the cast.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GCC with -Wsign-conversion does give a warning there without the cast. Assuming 2's complement, the operation yields the same result regardless, but I think it's better to do the operation as a signed addition.
include/pybind11/eigen.h
Outdated
@@ -200,14 +207,13 @@ template <typename Type_> struct EigenProps { | |||
// Casts an Eigen type to numpy array. If given a base, the numpy array references the src data, | |||
// otherwise it'll make a copy. writeable lets you turn off the writeable flag for the array. | |||
template <typename props> handle eigen_array_cast(typename props::Type const &src, handle base = handle(), bool writeable = true) { | |||
constexpr size_t elem_size = sizeof(typename props::Scalar); | |||
constexpr ssize_t elem_size = (ssize_t) sizeof(typename props::Scalar); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto
to avoid repeating the type.
include/pybind11/numpy.h
Outdated
size_t itemsize() const { | ||
return (size_t) detail::array_descriptor_proxy(m_ptr)->elsize; | ||
ssize_t itemsize() const { | ||
return (ssize_t) detail::array_descriptor_proxy(m_ptr)->elsize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unnecessary cast: already signed.
include/pybind11/numpy.h
Outdated
size_t itemsize() const { | ||
return (size_t) detail::array_descriptor_proxy(detail::array_proxy(m_ptr)->descr)->elsize; | ||
ssize_t itemsize() const { | ||
return (ssize_t) detail::array_descriptor_proxy(detail::array_proxy(m_ptr)->descr)->elsize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here.
include/pybind11/numpy.h
Outdated
size_t ndim() const { | ||
return (size_t) detail::array_proxy(m_ptr)->nd; | ||
ssize_t ndim() const { | ||
return (ssize_t) detail::array_proxy(m_ptr)->nd; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here.
include/pybind11/numpy.h
Outdated
trivial_broadcast_c && shape_iter != end; ++shape_iter, ++stride_iter) { | ||
auto shape_iter = buffers[i].shape.crbegin(); | ||
auto stride_iter = buffers[i].strides.crbegin(); | ||
for (; trivial_broadcast_c && shape_iter != end; ++shape_iter, ++stride_iter) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this purely a style change or is there some intent I'm not seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We changed the strides to signed first, which meant that shape and stride were different types. It's no longer necessary.
docs/advanced/pycpp/numpy.rst
Outdated
@@ -37,7 +37,7 @@ completely avoid copy operations with Python expressions like | |||
.def_buffer([](Matrix &m) -> py::buffer_info { | |||
return py::buffer_info( | |||
m.data(), /* Pointer to buffer */ | |||
sizeof(float), /* Size of one scalar */ | |||
(py::ssize_t)sizeof(float), /* Size of one scalar */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure an explicit cast is necessary here? As far as I know, compilers don't issue warnings for compile-time constants which are guaranteed to fit into the destination type:
#include <cstddef>
void foo(int) {}
int main() {
foo(sizeof(float)); // clean (no warnings)
size_t s = 4;
foo(s); // <-- conversion warning here
}
GCC, clang and MSVC for example: https://godbolt.org/g/zgibas
This might help avoid some of the casts on the user side.
@dean0x7d , thanks so much for the code review! Will fix these things and solve conflicts with main branch today. |
include/pybind11/numpy.h
Outdated
@@ -767,6 +769,9 @@ template <typename T, int ExtraFlags = array::forcecast> class array_t : public | |||
explicit array_t(ShapeContainer shape, const T *ptr = nullptr, handle base = handle()) | |||
: array(std::move(shape), ptr, base) { } | |||
|
|||
explicit array_t(size_t count, const T *ptr = nullptr, handle base = handle()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be ssize_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of that. This is a new constructor from the merge with the master branch. I wasn't sure what to do with it. If count
is ssize_t
, do we need a test for non-negativity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not exactly new (here's the same constructor in 2.1.1), it just disappeared between #788 and #822. I tried to handle it in #788 by having any_container
accept single values, but that broke 2-element initializer_list construction (due to ambiguity; see #822), so #822 reintroduced the older singleton value constructors.
Yes, I think throwing an exception here for a negative value is warranted; std::invalid_argument
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy should already be doing this check for us in PyArray_NewFromDescr
. Perhaps pybind11_fail
should just be replaced with error_already_set
to relay the original error message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for changing the pybind11_fail
to error_already_set
in the main constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this what you mean?a06f2f7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup. I suggest also adding a one-line test that tries to create with -1, just to verify that the exception is thrown and comes through properly, but that should work.
Looks good to me! There's a flake8 style error on Travis. Just fix that and squash the PR to remove the "merge from master" commits and I think this is good to merge. |
`EigenConformable::stride_compatible` returns false if the strides are negative. In this case, do not use `EigenConformable::stride`, as it is {0,0}. We cannot write negative strides in this element, as Eigen will throw an assertion if we do. The `type_caster` specialization for regular, dense Eigen matrices now does a second `array_t::ensure` to copy data in case of negative strides. I'm not sure that this is the best way to implement this. I have added "TODO" tags linking these changes to Eigen bug #747, which, when fixed, will allow Eigen to accept negative strides.
We're current copy by creating an Eigen::Map into the input numpy array, then assigning that to the basic eigen type, effectively having Eigen do the copy. That doesn't work for negative strides, though: Eigen doesn't allow them. This commit makes numpy do the copying instead by allocating the eigen type, then having numpy copy from the input array into a numpy reference into the eigen object's data. This also saves a copy when type conversion is required: numpy can do the conversion on-the-fly as part of the copy. Finally this commit also makes non-reference parameters respect the convert flag, declining the load when called in a noconvert pass with a convertible, but non-array input or an array with the wrong dtype.
@jagerman @dean0x7d I have rebased onto master and squashed all but 3 commits. I cannot guarantee that the first two commits are as they were originally, it is possible things got a little mixed up because of the move of |
Merged, thanks @crisluengo! |
It's a bit earlier to say for sure, but the Eigen bug tracking this got marked today as a blocking-3.4 feature, so, assuming it doesn't run into major problems on the Eigen side, it looks like negative strides will work there soon too. |
I have made changes such that strides are stored as
ssize_t
. I have added only one test case, I'd like advice on where else this feature needs to be tested.Eigen does not allow for negative strides, and will throw if you pass a numpy array
[::-1]
to it. I don't think the interface needs to test for this anywhere, since Eigen will test. Unless there is a feature that can copy the array if the data doesn't fit?