Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Handling of containers (e.g. std::vector) of Eigen::Ref<const T> #682

Closed
ManifoldFR opened this issue Aug 13, 2024 · 18 comments
Closed

Comments

@ManifoldFR
Copy link

ManifoldFR commented Aug 13, 2024

Problem description

Hello,

I would like to report a bug which happens under specific circumstances:

  1. from-Python conversions to types like std::vector<Eigen::Ref<const MatrixXd>> (with the default, fixed inner stride),
  2. when the Numpy arrays are (non-contiguous) slices of the same array with the same dimensions (see example code),
  3. when using versions of Eigen predating this PR on gitlab (which added the much needed move constructors).

Following @WKarel's work on PR #215, points 1-2 will lead to the creation of a temporary owned object in the Ref::m_object protected member. This happens on this line, because DMap has dynamic strides, and the Eigen::Ref has fixed inner stride hence the temporary is created:

return Ref(dcaster.operator DMap());

I know of two mitigations for now:

  • using a version of Eigen that has the move constructors for Eigen::Ref
  • use nb::DRef which was built to avoid copies

Reproducible example code

using MatXdR = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
using MatRefXdR = Eigen::Ref<const MatXdR>;
using Vec_MatRefXdR = std::vector<MatRefXdR>;
m.def("printRefs", [](const Vec_MatRefXdR& ms) {
     std::cout << "Got matrix\n" << ms[0] << std::endl;
     std::cout << "Got matrix\n" << ms[1] << std::endl;
      // passthrough
      std::vector<MatXdR> out(ms.size());
      for (size_t i = 0; i < ms.size(); i++) {
           out[i] = ms[i];
      }
      return out;
});
# in test_eigen.py
@needs_numpy_and_eigen
def test_vec_ref():
    print()
    np.set_printoptions(precision=4, linewidth=200)
    a = np.random.randn(8, 4).T  # .T ensures "wrong" order for slices
    s1 = a[:2]
    s2 = a[2:]
    print(s1)
    print(s2)
    r1, r2 = t.printRefs([s1, s2])
    print(r1)
    print(r2)
    assert np.allclose(s1, r1)
    assert np.allclose(s2, r2)

The code prints the following result (and the asserts fail)

[[-0.7264  0.4645 -1.3106  1.3194 -1.5611 -1.6445 -0.6314  0.6161]
 [ 0.6566 -0.5272  1.5168  1.2926  0.6595  0.6201 -0.8453  1.0318]]
[[-0.6287  0.4879 -0.8519 -0.2426 -0.0305  0.5116 -1.189  -1.3493]
 [-0.4122 -1.2882 -1.1801  1.0621  0.8644  1.1376 -0.7016  0.2041]]
Got matrix
 -0.628672    0.48795  -0.851887  -0.242632 -0.0305066   0.511559     -1.189   -1.34933
 -0.412173   -1.28822   -1.18013    1.06208   0.864423    1.13762  -0.701575   0.204068
Got matrix
 -0.628672    0.48795  -0.851887  -0.242632 -0.0305066   0.511559     -1.189   -1.34933
 -0.412173   -1.28822   -1.18013    1.06208   0.864423    1.13762  -0.701575   0.204068
[[-0.6287  0.4879 -0.8519 -0.2426 -0.0305  0.5116 -1.189  -1.3493]
 [-0.4122 -1.2882 -1.1801  1.0621  0.8644  1.1376 -0.7016  0.2041]]
[[-0.6287  0.4879 -0.8519 -0.2426 -0.0305  0.5116 -1.189  -1.3493]
 [-0.4122 -1.2882 -1.1801  1.0621  0.8644  1.1376 -0.7016  0.2041]]
@wjakob
Copy link
Owner

wjakob commented Aug 13, 2024

Is this an issue that nanobind can actually fix? If not, it might be better to include a static_assert that errors out when compiling with an older version of Eigen.

@ManifoldFR
Copy link
Author

ManifoldFR commented Aug 13, 2024

I think there could be a workaround where the temp storage is managed by Eigen::Ref's type caster itself, which should be kept alive (we can gate this at compile-time behind a check for the version of Eigen, the next release of Eigen won't need that).

This means other casters (e.g. optional_caster or the list_caster) shouldn't create the inner caster on the fly like here but always store them as data members.
This would also propagate to collections - for instance the list caster would have to store a list of inner casters I guess - not sure about the memory footprint for this, either. (I tried this on my local checkout of nanobind and it works).

@wjakob
Copy link
Owner

wjakob commented Aug 16, 2024

The change you propose to make casters store their member by default are likely impossible because this breaks creation of containers that store references (and it would in any case be too intrusive of a change, if "only" justified by the Eigen type caster). Perhaps a good stop-gap solution would be to raise a compile-time error message and ask users to upgrade their version of Eigen? If you agree, could you create a PR that adds such an error message?

@ManifoldFR
Copy link
Author

The change you propose to make casters store their member by default are likely impossible because this breaks creation of containers that store references (and it would in any case be too intrusive of a change, if "only" justified by the Eigen type caster).

Hm, you are right it would be intrusive and create tons of problems, and looking back to how pybind11 works, it didn't do that, either.

Perhaps a good stop-gap solution would be to raise a compile-time error message and ask users to upgrade their version of Eigen? If you agree, could you create a PR that adds such an error message?

Unfortunately I don't think this could work. Lots of things depend on Eigen and would need consistent versions of it for interop, and the move ctor is still on the master branch (with no release in sight yet, I've asked around on the Discord).

Over the past week, what I've done instead was check how pybind11 did it, which is through managing a copy (a numpy copy created in C++ and managed with pybind's life support mechanism).
I think we can reproduce this in nanobind by creating a copy ndarray with the temp memory and the right strides (fitting the desired column- or row-major order in Eigen), and manage it through a capsule that's added to the cleanup arg in from_python. I'll open a PR so you can take a look.

@WKarel
Copy link
Contributor

WKarel commented Aug 21, 2024

I see that

  • the proposed changes are considerable, and only necessary until the next release of Eigen, while
  • it may take a long time for Eigen to publish its next release.

A more compact and local (but untested) alternative that comes to my mind could be: derive a struct from Ref<const> with a move-Ctor that moves Ref<const>::m_object. This should be possible, because Ref<const>::m_object is not private, but protected. Something along the lines of:

#if EIGEN_VERSION <= 3.4.0

template<class CRef_>
struct MovableCRef : CRef_
{
  using CRef_::CRef_;
  MovableCRef::MovableCRef(CRef_&& that) 
  {
    this->m_object = std::move(that.m_object);
  }
  MovableCRef::MovableCRef(MovableCRef&& that) 
  {
    this->m_object = std::move(that.m_object);
  }
};

template<class Ref_>
using MovableRef = std::conditional_t<Ref_::is_const, MovableCRef<Ref_>, Ref_>;

#else

template<class Ref_>
using MovableRef = Ref_;

#endif

And use/return MovableRef whereever Ref<const>::m_object must be moved. Since MovableRef does not add data members, slicing (when returning MovableRef) should be no problem. However, I'm unsure whether changes could be limited to dense.h with this alternative, or changes would also need to be introduced into detail/nb_list.h et al. - in which case this alternative would be unacceptable, I guess.

@ManifoldFR
Copy link
Author

ManifoldFR commented Aug 21, 2024

I see that

  • the proposed changes are considerable, and only necessary until the next release of Eigen, while
  • it may take a long time for Eigen to publish its next release.

I agree, the changes are considerable because they aim to reproduce the workaround that pybind11 uses, but I'm sure an alternative is possible.
The last release of Eigen was exactly three years ago (Aug 18, 2021), and it will be the only one available on some package managers (in particular apt for Debian and Ubuntu {22,24} LTS) for the foreseeable future. So if we gatekeep this feature to a future Eigen > 3.4 many users and devs would be left out for very, very long (it also means we can't move our projects over from pybind or boost.python).

A more compact and local (but untested) alternative that comes to my mind could be: derive a struct from Ref<const> with a move-Ctor that moves Ref<const>::m_object. This should be possible, because Ref<const>::m_object is not private, but protected. Something along the lines of:

#if EIGEN_VERSION <= 3.4.0

template<class CRef_>
struct MovableCRef : CRef_
{
  using CRef_::CRef_;
  MovableCRef::MovableCRef(CRef_&& that) 
  {
    this->m_object = std::move(that.m_object);
  }
  MovableCRef::MovableCRef(MovableCRef&& that) 
  {
    this->m_object = std::move(that.m_object);
  }
};

template<class Ref_>
using MovableRef = std::conditional_t<Ref_::is_const, MovableCRef<Ref_>, Ref_>;

#else

template<class Ref_>
using MovableRef = Ref_;

#endif

And use/return MovableRef whereever Ref<const>::m_object must be moved. Since MovableRef does not add data members, slicing (when returning MovableRef) should be no problem. However, I'm unsure whether changes could be limited to dense.h with this alternative, or changes would also need to be introduced into detail/nb_list.h et al. - in which case this alternative would be unacceptable, I guess.

Actually this is a great suggestion. I'll try this in a new branch on my fork and see if/how it works.

@ManifoldFR
Copy link
Author

ManifoldFR commented Aug 21, 2024

@WKarel I tried your suggestion. Unfortunately it would require intrusive changes to nb_list.h to work (and I wouldn't know which kind of change to make), or MovableRef would need to be replace Eigen::Ref(...) everywhere in the wrapping code.

The problem here is on this line:

value.push_back(caster.operator cast_t<Entry>());

In this context, the template parameters are List = std::vector<Eigen::Ref<const T>>, Entry = Eigen::Ref<const T>, and the called function will be vector::push_back(const Entry&) (the copy version since there is no move ctor here and since the list caster is not aware the object is a actually a MovableCRef).

@wjakob
Copy link
Owner

wjakob commented Aug 22, 2024

FYI, this could perhaps be fixed by changing template <typename T_> using Cast = Ref; declaration to template <typename T_> using Cast = movable_cast_t<T_>; within the Ref type caster. (And similar, for Map)

@ManifoldFR
Copy link
Author

ManifoldFR commented Aug 22, 2024

FYI, this could perhaps be fixed by changing template <typename T_> using Cast = Ref; declaration to template <typename T_> using Cast = movable_cast_t<T_>; within the Ref type caster. (And similar, for Map)

I tried this, changing the cast operator to operator MovableRef<Ref>&&(), but this triggered std::bad_alloc memory errors.

This wouldn't change anything anyway, because at the call site for the cast operator in nb_list.h, std::vector::push_back() takes either const Entry& or Entry&&. So whatever the cast operator returns gets converted to a reference to Entry (which is the stored type Eigen::Ref here).

@ManifoldFR
Copy link
Author

I have tried other things but I don't think we can avoid having to do the same trick as pybind11 - creating our own temp memory and tying it to the cleanup list. However I think we can gate this behaviour behind a check for a version of Eigen higher or equal to the current master branch (3.4.90).

What do you think @WKarel ?

@WKarel
Copy link
Contributor

WKarel commented Aug 25, 2024

I just tried it, and my idea does not work with STL containers using Visual C++, because e.g. std::vector::push_back uses std::allocator::construct, which copy-constructs its argument, since Ref<T const> has no move constructor.

But apart from how to bind std::vector<Ref<T const>> to Python, what is your use-case for it? Ref<T const>s copy constructor is generally not fit for use with STL containers, because it does not copy m_object. And without even a move constructor, what would you (safely) do with a vector of it?

Using Eigen's current release:

using MatXdC = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor>;
using MatXdR = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
std::vector<Eigen::Ref<const MatXdR>> v;
{
  MatXdC m{ { 1., 2. } };
  Eigen::Ref<const MatXdR> r(m);  // r::m_object copy-initialized, and r points to it.
  v.push_back(std::move(r));  // v[0]::m_object not initialized. v[0] points to r::m_object.
}
v[0](0, 0);  // accesses destroyed memory.

@ManifoldFR
Copy link
Author

I just tried it, and my idea does not work with STL containers using Visual C++, because e.g. std::vector::push_back uses std::allocator::construct, which copy-constructs its argument, since Ref<T const> has no move constructor.

But apart from how to bind std::vector<Ref<T const>> to Python, what is your use-case for it? Ref<T const>s copy constructor is generally not fit for use with STL containers, because it does not copy m_object. And without even a move constructor, what would you (safely) do with a vector of it?

I wasn't aware that Eigen::Ref<const T> was not fit for use with STL containers, because I didn't know about the m_object temporary.
We use STL containers of Refs in some places, namely optional<Eigen::Ref<const T>> in ProxQP for passing QP parameters as optional arguments.

Using Eigen's current release:

using MatXdC = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::ColMajor>;
using MatXdR = Eigen::Matrix<double, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor>;
std::vector<Eigen::Ref<const MatXdR>> v;
{
  MatXdC m{ { 1., 2. } };
  Eigen::Ref<const MatXdR> r(m);  // r::m_object copy-initialized, and r points to it.
  v.push_back(std::move(r));  // v[0]::m_object not initialized. v[0] points to r::m_object.
}
v[0](0, 0);  // accesses destroyed memory.

I think std::optional behaves the same, causing this kind of undefined behaviour.

@WKarel
Copy link
Contributor

WKarel commented Aug 26, 2024

std::optional<Eigen::Ref<const T>> makes more sense to me. But indeed, the problem is the same.

As a work-around specifically for std::optional<Eigen::Ref<const T>>, you may define your own partial specialization of type_caster that constructs the Ref from a Map in-place:

NAMESPACE_BEGIN(NB_NAMESPACE)
NAMESPACE_BEGIN(detail)

template<typename T>
struct type_caster<std::optional<Eigen::Ref<const T>>> : optional_caster<std::optional<Eigen::Ref<const T>>>
{
  using Caster = typename type_caster::optional_caster::Caster;
  using Map = typename Caster::Map;
  using DMap = typename Caster::DMap;

  bool from_python(handle src, uint8_t flags, cleanup_list* cleanup) noexcept {
    if (src.is_none())
      return true;
    Caster caster;
    if (!caster.from_python(src, flags_for_local_caster<T>(flags), cleanup) ||
        !caster.template can_cast<T>())
      return false;
    if (caster.dcaster.caster.value.is_valid())
      value.emplace(caster.dcaster.operator DMap());
    else
      value.emplace(caster.caster.operator Map());
    return true;
  }
};

NAMESPACE_END(detail)
NAMESPACE_END(NB_NAMESPACE)

@ManifoldFR
Copy link
Author

ManifoldFR commented Aug 27, 2024

std::optional<Eigen::Ref<const T>> makes more sense to me. But indeed, the problem is the same.

As a work-around specifically for std::optional<Eigen::Ref<const T>>, you may define your own partial specialization of type_caster that constructs the Ref from a Map in-place:

NAMESPACE_BEGIN(NB_NAMESPACE)
NAMESPACE_BEGIN(detail)

template<typename T>
struct type_caster<std::optional<Eigen::Ref<const T>>> : optional_caster<std::optional<Eigen::Ref<const T>>>
{
  using Caster = typename type_caster::optional_caster::Caster;
  using Map = typename Caster::Map;
  using DMap = typename Caster::DMap;

  bool from_python(handle src, uint8_t flags, cleanup_list* cleanup) noexcept {
    if (src.is_none())
      return true;
    Caster caster;
    if (!caster.from_python(src, flags_for_local_caster<T>(flags), cleanup) ||
        !caster.template can_cast<T>())
      return false;
    if (caster.dcaster.caster.value.is_valid())
      value.emplace(caster.dcaster.operator DMap());
    else
      value.emplace(caster.caster.operator Map());
    return true;
  }
};

NAMESPACE_END(detail)
NAMESPACE_END(NB_NAMESPACE)

This sounds like a reasonable solution, I'll check it out. I could probably do the same with list_caster and make my own library for nanobind+Eigen+STL patches (unless you two are open to upstreaming these patches into nanobind?).

@WKarel
Copy link
Contributor

WKarel commented Aug 27, 2024

I am not in the position of accepting PRs here. The only way for nanobind itself to support what you need using the current release of Eigen seems to require your rather comprehensive changes, which @wjakob seems to at least dislike. So I've tried to provide a solution outside of it.

Eigen::Ref<const T> is a rather weird citizen, because it can either map its own memory, or some external one (which is hardly possible to tell from the outside), and when copying, the copy may reference the memory of the original, even if the original owns it. Hence, to be safe when making copies, one always needs to keep alive the originals.
That can be fine when creating copies on the stack (possibly as function arguments). But e.g. std::vector::emplace_back may re-allocate its array and move existing elements onto the new one. Without a proper move constructor, this will copy-construct Eigen::Ref<const T>s on the new array and delete the originals on the old one - bang! This may be prevented e.g. within nanobind by pre-allocating the needed capacity. However, I would think twice before broadly using std::vector<Eigen::Ref<const T>> in my own code.

I guess the intent for its unusual copy semantics is to make it cheap to call functions that expect Eigen::Ref<const T> as function argument (which it really is, unless its m_object is an array of large, static size). Otherwise, functions would need to expect const Eigen::Ref<constT>& as arguments - which is a little longer to read, but certainly fits much better into C++. Even the Eigen docs seem to prefer the latter. I think that Eigen::Ref<const T> should have its copy constructor deleted, but it is probably too late now.

@ManifoldFR
Copy link
Author

ManifoldFR commented Aug 27, 2024

I am not in the position of accepting PRs here. The only way for nanobind itself to support what you need using the current release of Eigen seems to require your rather comprehensive changes, which @wjakob seems to at least dislike. So I've tried to provide a solution outside of it.

That's fine then. I'll just apply fixes within our own codebase.
edit: your suggestion for overriding the std::optional type_caster worked wonderfully well, @WKarel.

Eigen::Ref<const T> is a rather weird citizen, because it can either map its own memory, or some external one (which is hardly possible to tell from the outside), and when copying, the copy may reference the memory of the original, even if the original owns it. Hence, to be safe when making copies, one always needs to keep alive the originals. That can be fine when creating copies on the stack (possibly as function arguments). But e.g. std::vector::emplace_back may re-allocate its array and move existing elements onto the new one. Without a proper move constructor, this will copy-construct Eigen::Ref<const T>s on the new array and delete the originals on the old one - bang! This may be prevented e.g. within nanobind by pre-allocating the needed capacity. However, I would think twice before broadly using std::vector<Eigen::Ref<const T>> in my own code.

You're right, since vectors can move it would be dangerous. I think I will suggest to my org we outright ban STL vectors (and I guess, sets, and maps?) for Eigen::Ref<const...> outside of optional.

I guess the intent for its unusual copy semantics is to make it cheap to call functions that expect Eigen::Ref<const T> as function argument (which it really is, unless its m_object is an array of large, static size). Otherwise, functions would need to expect const Eigen::Ref<constT>& as arguments - which is a little longer to read, but certainly fits much better into C++. Even the Eigen docs seem to prefer the latter.

This makes sense, I usually do const Eigen::Ref<const T>& in my own code. I guess the copy semantics are a good safeguard when the user has to use something like a container

I think that Eigen::Ref<const T> should have its copy constructor deleted, but it is probably too late now.

Instead, there's an open MR on Gitlab which seeks to remove the temporary m_object instead (as an option in the template parameters).

@wjakob
Copy link
Owner

wjakob commented Aug 29, 2024

What's the resolution? Shall I close the issue?

@ManifoldFR
Copy link
Author

ManifoldFR commented Aug 29, 2024

What's the resolution? Shall I close the issue?

We concluded that std::vector<Eigen::Ref> is in fact dangerous due to the temporary and weird copy semantics (even when there is a move ctor), and aside from my intrusive proposal for the type_caster there is no safe way of operating it (even in C++ itself).
I had a similar issue downstream with std::optional which I've ended up solving by specializing the type_caster.

Thank you for the discussion, I'll be closing this issue and my PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants