Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GDExtension Ref<T> object freeing error. #652

Closed
kidrigger opened this issue Nov 16, 2021 · 11 comments
Closed

GDExtension Ref<T> object freeing error. #652

kidrigger opened this issue Nov 16, 2021 · 11 comments
Labels
bug This has been identified as a bug topic:gdextension This relates to the new Godot 4 extension implementation

Comments

@kidrigger
Copy link

Background Info:

Working on adding GDExtension support for Video playback.
One part of the VideoStream is Ref<VideoStreamPlayback> VideoStream::instance_playback()

For the VideoStreamExtension class this is implemented as

Ref<VideoStreamPlayback> VideoStreamExtension::instance_playback() {
	Ref<VideoStreamPlaybackExtension> ref;
	if (GDVIRTUAL_CALL(_instance_playback, ref)) {
		ERR_FAIL_COND_V_MSG(ref.is_null(), nullptr, "Plugin returned null playback");
		// open_file() is an unregistered private method in VideoStreamPlaybackExtension
                ref->open_file(file);  
                // set_audio_track() is a virtual method implemented with GDVIRTUAL_CALL
		ref->set_audio_track(audio_track); 
		return ref;
	}
	return nullptr;
}

On the GDExtension plugin side:

godot::Ref<godot::VideoStreamPlaybackExtension> godot::VideoStreamReference::_instance_playback() {
	Ref<VideoStreamPlaybackReference> ref;
	ref.instantiate();
        // will add the temp fix here.
	return ref;
}

The Issue:

The above implementation crashes with:

Exception thrown: read access violation.
**VideoStreamPlayback::set_audio_track[virtual]** was 0xFFFFFFFFFFFFFFFF.

Stepping through in the debugger shows the VideoStreamPlaybackReference::_cleanup (Destructor) is called on return from

// binder_common.hpp
void call_with_ptr_args_ret_helper(T *p_instance, R (T::*p_method)(P...), const GDNativeTypePtr *p_args, void *r_ret, IndexSequence<Is...>) {
	PtrToArg<R>::encode((p_instance->*p_method)(PtrToArg<P>::convert(p_args[Is])...), r_ret);
}

As the Ref counts to 0.

If one stores the Ref in the plugin: Godot and the Plugin both has separate 'ownership' of the object. As such after cleanup on one end - the other Ref dangles.

Temporary Solution:

RefCounted* r = static_cast<RefCounted*>(ref.ptr());
r->reference();

Adding this increments the refcount and prevents the deletion after encode.
The Ref quietly deletes and reduces refcount to 1 which is copied and owned by Godot

Works for this particular case since it's an ownership transfer. But it will not work in Shared ownership case as RefCounted object seems to not be shared across the boundary.

Related: #417

@Calinou Calinou added bug This has been identified as a bug topic:gdextension This relates to the new Godot 4 extension implementation labels Nov 16, 2021
@BastiaanOlij
Copy link
Collaborator

Haven't run into this myself but I have a feeling I soon will. We had similar issues with returning String and Object but not sure how @vnen fixed those.

@kidrigger
Copy link
Author

kidrigger commented Nov 17, 2021

Haven't run into this myself but I have a feeling I soon will. We had similar issues with returning String and Object but not sure how @vnen fixed those.

I haven't pushed my code yet - cleaning up a little. But I'll have buildable reference plugin for debugging.
I'll look at how to solve this too.

Need to see what's going wrong with RefCounted

@Zylann
Copy link
Collaborator

Zylann commented Nov 19, 2021

If returning a ref from a function makes it dangling, then how is such usage even working in core? What I think is happening in Godot, is that on return, a copy of Ref is made, which does implicitely increments refcount for the duration of the return, and that does not seem to happen properly here (although move semantics and some compiler optimizations could reduce this because this increment is fairly redundant at the end of the day).

It worked in GDNative because Ref return values were wrapped inside a Variant. Variant had a constructor taking a pointer bindings-side, and that's where the ownership was picked by Godot, in addition ot Ref having a Variant cast

operator Variant() const {

@kidrigger
Copy link
Author

Attaching the callstack to the location of memdelete in the ref.
During the return process at the end of the call_with_ptr_args_ret_helper the memdelete is called on the data in Ref<>

As I have noted previously - increasing the RefCounted's count by 1 solves this issue without any memory leaks.
I still need to test this on a small reproduction. Will keep posted.

libgdvideo.dll!godot::Ref<godot::VideoStreamPlaybackExtension>::unref() Line 216	C++
libgdvideo.dll!godot::Ref<godot::VideoStreamPlaybackExtension>::~Ref<godot::VideoStreamPlaybackExtension>() Line 228	C++
libgdvideo.dll!godot::call_with_ptr_args_ret_helper<godot::VideoStreamReference,godot::Ref<godot::VideoStreamPlaybackExtension>>(godot::VideoStreamReference * p_instance, godot::Ref<godot::VideoStreamPlaybackExtension>(godot::VideoStreamReference::*)() p_method, void * const * p_args, void * r_ret, IndexSequence<> __formal) Line 163	C++
libgdvideo.dll!godot::call_with_ptr_args<godot::VideoStreamReference,godot::Ref<godot::VideoStreamPlaybackExtension>>(godot::VideoStreamReference * p_instance, godot::Ref<godot::VideoStreamPlaybackExtension>(godot::VideoStreamReference::*)() p_method, void * const * p_args, void * r_ret) Line 183	C++
libgdvideo.dll!godot::VideoStreamExtension::register_virtuals::__l10::<lambda>(void * p_instance, void * const * p_args, void * p_ret) Line 70	C++
libgdvideo.dll!void <lambda>(void *, void * const *, void *)::<lambda_invoker_cdecl>(void * p_instance, void * const * p_args, void * p_ret) Line 70	C++
godot.windows.tools.64.exe!VideoStreamExtension::_gdvirtual__instance_playback_call(Ref<VideoStreamPlaybackExtension> & r_ret) Line 113	C++
godot.windows.tools.64.exe!VideoStreamExtension::instance_playback() Line 196	C++

@Zylann
Copy link
Collaborator

Zylann commented Nov 19, 2021

I think the culprit might be around PtrToArg<R>::encode. We call the p_method, which returns a Ref<T>, which is given to encode. encode pretty much assigns r_ret with that return value. Problem: r_ret is a raw pointer. So it heavily depends on the proper template being selected. I suspect encode is only picking up the pointer to the object, without increasing refcount, putting it inside r_ret... but Godot was not involved in the assignment somehow, no ownership is passed (in GDNative, ownership was passed if the destination was a Variant for example). Then the call to call_with_ptr_args_ret_helper has to complete, which means the temporary Ref<T> created to hold the return value gets destroyed before the function returns, causing the object to get destroyed.

To be sure if that really happens, we have to step and see what happens in encode. I dont see a specialization of PtrToArg taking Object*, so I assume it would be seen as Variant? Maybe Variant has a bug? Or does Godot expect us to cast to Ref<T> since thats the method signature here?:

	Ref<VideoStreamPlaybackExtension> ref;
	if (GDVIRTUAL_CALL(_instance_playback, ref)) {

(there is no specialization of PtrToArg for Ref)
Needs investigation.

We also need to make sure that a fix that "doesnt crash" would not create a leak on the other hand. When Godot gets the reference, we need to be sure it's the only one at the end. So it's possible that we would not need to call reference() at all from the bindings side (as was the case in GDNative).

@kidrigger
Copy link
Author

kidrigger commented Nov 19, 2021

I've run through the calls - Variant never enters the picture.
This is the encode() selected. There seems to be a specialization of PtrToArg for Ref

template <class T>
struct PtrToArg<Ref<T>> {
	_FORCE_INLINE_ static Ref<T> convert(const void *p_ptr) {
		return Ref<T>(godot::internal::gdn_interface->object_get_instance_binding((void *)p_ptr, godot::internal::token, &T::___binding_callbacks));
	}

	typedef Ref<T> EncodeT;

	_FORCE_INLINE_ static void encode(Ref<T> p_val, const void *p_ptr) {
		*(void **)p_ptr = p_val->_owner;
	}
};

Currently - increasing reference on the plugin side doesn't cause a leak (at not reported by ClassDB) as long as the reference is not stored in plugin. I haven't tested with storing but I can test that.

I am stuck with end semester evaluations so I won't be able to investigate this until mid December. I will however push my plugin and Godot code to allow easy reproduction in a couple of days.

@Zylann
Copy link
Collaborator

Zylann commented Nov 19, 2021

Ah, you're right, such specialization exists. I was only looking in method_ptrcall.hpp, thought all the specializations were there...

_FORCE_INLINE_ static void encode(Ref<T> p_val, const void *p_ptr) {
*(void **)p_ptr = p_val->_owner;
}

This indeed does not give any ownership. So it's like being given a Ref<T> by reference, but directly assigning the pointer inside this Ref without going through operator=.

If Godot unpacks this as a Ref<T> without actually creating one with a constructor or operator=, then the effect of the assignment operator has to be replicated here. So the idea is this:

*(Ref<T>*)p_ptr = p_val;

This is, of course, specifically when Godot actually does expect the return value to be Ref<T>, and not T*.
Note that Godot itself does this in its own PtrToArg implementation (I dont have the link rn)

However, the current code does ->_owner. And that's because pointers between Godot and the wrapper may not match. So directly writing this cast would not work. But we can obtain the same behavior here by essentially re-writing operator=:

if (p_val != nullptr) {
    p_val->reference();
    *(void**)p_ptr = p_val->_owner;
} else {
    *(void**)p_ptr = nullptr;
}

This code also assumes p_ptr already points to a null Ref<T> ready to receive the value, such that we would not need to also make sure it unrefs whatever it may have contained before.
If this is not true however, we'd also need to deal with it (I hope not).
If we are also not supposed to give ownership from here and Godot takes it by itself after the call engine-side, then more changes might be required, though we could test and see if that cause any leak in other situations.
I think this would need confirmation with an author of the API.

@Zylann
Copy link
Collaborator

Zylann commented Nov 19, 2021

I think Godot is providing a pointer to the adress where the return value begins. In this case, a Ref<T>. Which at the end of the day, is a pointer to T*. Assigning most types is done by casting the pointer into the associated types and just assigning. The problem is that in the case of Ref<T>, assignment is not trivial, as it's not just a simple pointer like Object*. Types like Array or Dictionary also aren't trivial, but unlike Object types these types are castable directly such that invoking their operator= is simple (so *((m_type *)p_ptr) = p_val; works fine for them).

There cannot be two separate reference counts because the reference count is stored inside the Godot object. So either side could increment or decrement, it should remain consistent, as long as each side does it according to the same rules.

But currently it's seems to be creating two separate ownerships as encode seems to break something there.

Is this after trying with the code I suggested? Or the original code?

Storing a Ref on GDExtension side does not work as eventually at delete
time there's two separate instances where memdelete is called on the same
object - this can only happen if the two objects are not using the same Ref
counts. This part is confusing. If I had to guess - something is wrong with
RefCounted working via current encode.

Not exactly sure what's up there, but current encode looks wrong. So far all I've done is read the code and deduced what should be done from what I know, I need to setup Godot 4 and the associated CppBindings to actually give it a try.

@kidrigger
Copy link
Author

kidrigger commented Nov 20, 2021

There cannot be two separate reference counts because the reference count is stored inside the Godot object. So either side could increment or decrement, it should remain consistent, as long as each side does it according to the same rules.

Made a mistake in debugging. Attaching the Godot stack trace for the crash at the bottom as (A).
The count isn't different - but because the Ref destructor is called from the plugin - it tries to obtain the RefCount to delete.
Since Ref is already deleted by godot (as inferred from the discussion - returning from plugin to godot reduces the count one extra time during encode which creates a conversion without adding a count) this causes the Ref in plugin to dangle. There is a double delete for sure - but the crash is not due to RefCount being wrong but due to RefCount itself dangling.

Is this after trying with the code I suggested? Or the original code?

It was with Original code.
Your fix works for the non-storage part and the storage part. It is aligned with my observations and the hack-fix as well.
I will test it further later - currently testing with a single static variable for storage and an std::deque being used as a set worked. (The lack of the godot containers is a bit disruptive however. Godot code doesn't take kindly to std::set)

Should be fixed. Will make a PR once tested satisfactorily.

Adding links to my plugin and godot branch for reproduction. (Let me know any mistakes I might have made - easier to fix while everything is WIP)

Note that the fix has already been applied via submodules (The godot-cpp in use is my fork with your proposed fix)

A:

[0] std::_Atomic_integral<unsigned int,4>::fetch_add (D:\Visual Studio\VC\Tools\MSVC\14.28.29333\include\atomic:1458)
[1] std::_Atomic_integral_facade<unsigned int>::fetch_sub (D:\Visual Studio\VC\Tools\MSVC\14.28.29333\include\atomic:1687)
[2] SafeNumeric<unsigned int>::decrement (D:\Work\godot\core\templates\safe_refcount.h:82)
[3] SafeRefCount::unrefval (D:\Work\godot\core\templates\safe_refcount.h:182)
[4] RefCounted::unreference (D:\Work\godot\core\object\ref_counted.cpp:76)
[5] call_with_ptr_args_ret_helper<RefCounted,bool> (D:\Work\godot\core\variant\binder_common.h:254)
[6] call_with_ptr_args_ret<RefCounted,bool> (D:\Work\godot\core\variant\binder_common.h:503)
[7] MethodBindTR<RefCounted,bool>::ptrcall (D:\Work\godot\core\object\method_bind.h:452)
[8] gdnative_object_method_bind_ptrcall (D:\Work\godot\core\extension\gdnative_interface.cpp:808)
[9] godot::internal::_call_native_mb_ret<bool> (D:\Work\godot-video-reference\godot-cpp\include\godot_cpp\core\engine_ptrcall.hpp:59)
[10] godot::RefCounted::unreference (D:\Work\godot-video-reference\godot-cpp\gen\src\classes\ref_counted.cpp:56)
[11] godot::Ref<godot::VideoStreamPlaybackReference>::unref (D:\Work\godot-video-reference\godot-cpp\include\godot_cpp\classes\ref.hpp:215)
[12] godot::Ref<godot::VideoStreamPlaybackReference>::~Ref<godot::VideoStreamPlaybackReference> (D:\Work\godot-video-reference\godot-cpp\include\godot_cpp\classes\ref.hpp:228)
[13] godot::Ref<godot::VideoStreamPlaybackReference>::`scalar deleting destructor'

@kidrigger
Copy link
Author

kidrigger commented Nov 20, 2021

Writing a conclusion as to why this is logically correct and not a Hack.

Original Code:

_FORCE_INLINE_ static void encode(Ref<T> p_val, const void *p_ptr) {
*(void **)p_ptr = p_val->_owner;
}

This directly does the ownership transfer without incrementing RefCounted - as p_ptr is also another new Ref<T> that has to be counted.
At the end of the calling function the Ref<T> goes out of scope thus decrementing the RefCounted by 1. Thus there is exactly 1 extra copy that is unaccounted - causing the issue.

step ref count actual count
before call_with_ptr_args_ret_helper n n
in encode n n+1
after encode n n+1
after call_with_ptr_args_ret_helper n-1 n

Thus at n=1, memdelete is called and the Ref<T> dangles.

By using the fix

if (p_val != nullptr) {
    p_val->reference();
    *(void**)p_ptr = p_val->_owner;
} else {
    *(void**)p_ptr = nullptr;
}

We increment the count manually (1 extra count before doing the pointer assignment thereby increasing the number of references by 1)

step ref count actual count
before call_with_ptr_args_ret_helper n n
after p_val->reference() n+1 n
after encode n+1 n+1
after call_with_ptr_args_ret_helper n n

Thus the invariant count is correct at the end. The fix is logically correct.

kb173 added a commit to boku-ilen/geodot-plugin that referenced this issue Feb 3, 2022
These are likely related to refcounting issues.
The change in the SConstruct script fixes a crash that occured somewhere in `bind_get_argument_type` when registering classes and binding methods.
Commenting out the `delete`s in destructors fixed a `double free or corruption` crash that occured after the fix described above.

Likely related to godotengine/godot-cpp#652, but neither godotengine/godot-cpp#660 or godotengine/godot-cpp#662 seemed to help...
kb173 added a commit to boku-ilen/geodot-plugin that referenced this issue Feb 15, 2022
These are likely related to refcounting issues.
The change in the SConstruct script fixes a crash that occured somewhere in `bind_get_argument_type` when registering classes and binding methods.
Commenting out the `delete`s in destructors fixed a `double free or corruption` crash that occured after the fix described above.

Likely related to godotengine/godot-cpp#652, but neither godotengine/godot-cpp#660 or godotengine/godot-cpp#662 seemed to help...
kidrigger added a commit to kidrigger/godot that referenced this issue Sep 24, 2022
Adds VideoStream and relevant resource loaders to migrate
external GDNative plugins to GDExtension.

Theora is modified to use new base class and register with
VideoDecoderServer.

There are known bugs in godot-cpp w.r.t. reference counting in Ref
(godotengine/godot-cpp#652)
However, from godot's side this should work just fine once that is
fixed.
kidrigger added a commit to kidrigger/godot that referenced this issue Oct 6, 2022
Adds VideoStream and relevant resource loaders to migrate
external GDNative plugins to GDExtension.

Adds a VideoStreamLoader as a specialization of ResourceFormatLoader
as ClassDB::is_parent_class is inaccessible from GDExtension currently.

Using Object* instead of Ref<T> in order to avoid the refcount bug
(godotengine/godot-cpp#652)
Also another bug is in ResourceLoader in use on the extension side that
requires fixing.
@Faless
Copy link
Contributor

Faless commented Dec 15, 2022

Fixed via #958.

@Faless Faless closed this as completed Dec 15, 2022
lyuma pushed a commit to kidrigger/godot that referenced this issue Jan 31, 2023
Adds VideoStream and relevant resource loaders to migrate
external GDNative plugins to GDExtension.

Adds a VideoStreamLoader as a specialization of ResourceFormatLoader
as ClassDB::is_parent_class is inaccessible from GDExtension currently.

Using Object* instead of Ref<T> in order to avoid the refcount bug
(godotengine/godot-cpp#652)
Also another bug is in ResourceLoader in use on the extension side that
requires fixing.
Streq pushed a commit to Streq/godot that referenced this issue Feb 9, 2023
Adds VideoStream and relevant resource loaders to migrate
external GDNative plugins to GDExtension.

Adds a VideoStreamLoader as a specialization of ResourceFormatLoader
as ClassDB::is_parent_class is inaccessible from GDExtension currently.

Using Object* instead of Ref<T> in order to avoid the refcount bug
(godotengine/godot-cpp#652)
Also another bug is in ResourceLoader in use on the extension side that
requires fixing.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This has been identified as a bug topic:gdextension This relates to the new Godot 4 extension implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants