Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please allow using Packed Arrays like Pointers onto other types of Packed Arrays. #3917

Open
SolsticeProjekt opened this issue Feb 5, 2022 · 10 comments

Comments

@SolsticeProjekt
Copy link

SolsticeProjekt commented Feb 5, 2022

Describe the project you are working on

It's about performance, capabilities, limitations and overcoming them.

Context:
Godot is for me like you've given me a Commodore 64 with a manual and said "Have at it!".
Such a product lasts! It really does! On your part it mostly only requires a manual explaining everything.
On the user's part it only requires a curious mind,
the ability to express creativity through the machine's capabilities and limitations
and the will to overcome them.

Not only is the C64 demo scene alive and kicking,
the market for new games for the machine is actually growing.

Anyhow.

I do all kind of stuff related to experimenting with the Engine's capabilities.
I'm writing a benchmark for 3.4 and 4.0, to learn how the Engine works and how it shouldn't.

Fun Fact! In Godot 3.4, using a 2D Array() LookUpTable filled with index calculations y * width + x,
is faster than calculating the indices every inner loop!

Godot 4's GDScript 2.0 is way faster than the current stable implementation! :D

So, anyhow, that's what I do. Testing, experimenting, making use of the results.
It's about performance, capabilities and experimenting.

Describe the problem or limitation you are having in your project

The engine lacks the ability to go through Arrays of bigger types using smaller types.
From my perspective that's a massive oversight, adding unnecessary workload
and limiting creative thinking when it comes to programming.

Being able to use PackedArrays like typed pointers pointing into other PackedArrays should be a thing.

It's not unnecessarily complicated even for completely clueless people,
it gives room for creative ideas related to optimization,
and indirectly familiarizes with the linear nature of memory.

Describe the feature / enhancement and how it helps to overcome the problem or limitation

  • Helps avoiding unnecessary copies.
    Requiring the copying a whole array for byte-scale access using _to_byte_array() is forcing people to waste resources. In every case where no copy is needed, It's a waste of CPU-time, memory bandwidth and cache. Unless the data is supposed to being modified, there is no reason to create a copy, which means the user should have the option to create a copy, but not being forced to do it unnecessarily.

  • Allows for faster working through Arrays. (1/2)
    Imagine you have an Array of Bytes (works with 32bit Ints as well, and floats, of course, when you're bit-hacking!) and you need to add the value 7 to all of them. With the current implementation, you go through the elements one by one and add to each.
    With my proposal implemented, you'd cast a 64bit Int Array onto your Byte Array, have a 64bit variable QWord ...

var QWord : int

for q in 8:
    QWord+= 7 * (2 ^ (q * 8))
  • (2/2)
    ... or maybe you use a constant. Doesn't matter. Then, with that, you'd run through the 64bit Array,
    (which points into the Byte Array) and add the QWord to eight bytes at once,
    thus doing one eight of the additions required to get the same job done.
    One eight of the memory accesses. One eight of the loop-iterations.

For an Array of a Million elements, you'd be cutting everything down to 1 Mill / 8 = 125k.

  • Faster checks of status variables.
    Imagine you have an array of eight bytes Bytes[], indicating a status. Currently, to check every status,
    you'd have to go through the Array one by one. With my proposal, you'd cast a 64bit PackedArray, size 1,
    and can check the states all at once. In the "native code" world I use this for threads.

Example: I have an Array of 8 Bytes, each being 0 or not, with "not" indicating that the thread is busy.
Instead of going through each element one by one, I read a single QWord starting at Bytes[0] and do one check.
In Godot, that'd be: While QWord != 0: pass

Hm.
More?

Describe how your proposal will work, with code, pseudo-code, mock-ups, and/or diagrams

var QWords := PackedInt64Array(range(100))      # 64bit
var DWords := PackedInt32Array(QWords[0])      # 32bit, giving an index
# *cries in 16 bits*
var Bytes := PackedByteArray(QWords)                # No index implies the beginning


func _ready() -> void:
    var Int32s : PackedInt32
    Int32s = QWords.point_at(69)        # To nicely show how it could be done at run-time.

    Bytes[0] = 255
    Bytes[1] = 255
    Bytes[2] = 255
    Bytes[3] = 255
    print( DWords[0] )        # prints 4,294,967,296

PS:
Asking here to potentially save time writing another proposal ...
Can we have signed Bytes and unsigned DWords/QWords, or is that impossible?

Thank you!

If this enhancement will not be used often, can it be worked around with a few lines of script?

A copy of the data is not a sensible workaround.

Is there a reason why this should be core and not an add-on in the asset library?

@YuriSizov
Copy link
Contributor

The engine lacks the ability to go through Arrays of bigger types using smaller types.
From my perspective that's a massive oversight, adding unnecessary workload
and limiting creative thinking when it comes to programming.

We prefer when proposals come from a concrete limitation which appeared in your actual project, rather than nice-to-haves which can be theoretically good. Do you have an actual limitation that you face, and not something that requires the reader to imagine having an array of bytes?

I feel from your examples, that you may want to write an engine module or use GDNative, rather than GDScript.

@Xrayez
Copy link
Contributor

Xrayez commented Feb 5, 2022

This proposal is about utilizing CPU capabilities that happen to efficiently operate on DWORD/QWORD. I think it's possible to implement for GDScript, the question is that whether you really need to use GDScript and not C++/compiled language for this.

I think images libraries operate on uint32_t alone for efficiency, this is not necessarily the case in Godot's Image class. @reduz was not against adding/improving Image class in Godot 4.x, see godotengine/godot#45924 (comment). So there may be a possibility that core data structures like PackedByteArray could be improved to speed up data processing in general.

Godot has BitMap which allows to optimize for storage.

These are just examples of course.

Asking here to potentially save time writing another proposal ...

Consider supporting #2069, thanks!

@Calinou
Copy link
Member

Calinou commented Feb 5, 2022

This proposal is about utilizing CPU capabilities that happen to efficiently operate on DWORD/QWORD. I think it's possible to implement for GDScript, the question is that whether you really need to use GDScript and not C++/compiled language for this.

This sounds very similar to #290.

@Xrayez
Copy link
Contributor

Xrayez commented Feb 5, 2022

I think #290 is totally related. I'm not completely proficient in this, but this proposal is more about being able to re-interpret the data (and taking advantage of memory address alignment and whatnot) rather than taking advantage of CPU-specific SIMD instructions (but not quite specific nowadays).

@Xrayez
Copy link
Contributor

Xrayez commented Feb 5, 2022

@SolsticeProjekt I've stumbled upon this PR in Godot 4.x, this may help you perhaps: godotengine/godot#47761. Though I'm not sure how marshalling could improve efficiency in this case, maybe not, or maybe just in some cases.

@SolsticeProjekt
Copy link
Author

SolsticeProjekt commented Feb 6, 2022

I feel from your examples, that you may want to write an engine module or use GDNative, rather than GDScript.

Thank you for sharing your feelings. Very helpful.

Of course I had a look at how Godot can use native code, IPC, etc. I was disappointed, actually, because there's no shared memory/memory mapping. GDNative is overkill. Give me a shared memory buffer, let me execute an external process,
give me something to tell Godot "it's done". Bonus points would be the ability to actually pin threads to cores,
because especially (or specifically, but it's generally useful) on Windows this needs manual control for efficiency.

The above covers a lot of use cases, while avoiding all the unnecessary work just to interface with native code.

The alternative would be, of course, using sockets. I haven't yet checked if Godot supports zero-copy for localhost "transfers".
It's still on my list of things to do (and propose, if necessary). Iirc it's supported on all platforms. GDNative? No thanks!


@Xrayez Yes, uint32 and uint64. Considering the relatively huge amount of people asking how to manipulate image data,
it's kind of odd that the two types, used for exactly that, aren't supported. And while signed bytes seem unnecessary, they're usefull for -1/+1 switches and some other neat tricks I can't come up with right now. (slurps morning coffee)

Also where's the 16 bit types! lol

I've looked at 2069. I see what you're doing. Are you sure that having such ideas removed from "proposals" is a good idea?
When it removes "eyes from ideas" it's the equivalent of "moving things away so we can better ignore them"? :D

@Calinou My proposal is tangentially related to the SIMD one. It this is also on my list of things I wanted to propose.
(I only check pre-existing proposals when I get to it. One step at a time.)

Having functions optimized for use on big arrays, with big data, threading. Like, as one example of many,
the ability to square-root through a big array. Problem is that there are many things that can be done and they'd all have
to be programmed. I'm probably supposed to post there, instead of talking about it here. Wanted to get another coffee anyway...

@Xrayez Re: 47761. I don't understand what this does. "Encode" implies to me that the data is going to be prepared/converted, no? That's not what I'm asking for and just a huge waste of resources for no good reason. I see no point in "converting" from one type to another, when it's all just bytes in memory anyway. What I'm asking for is having different types of PackedArrays be able to point at the same spots in memory, where the data resides. Pointers! They're awesome! :D

I hope I didn't miss anything.

@Calinou
Copy link
Member

Calinou commented Feb 6, 2022

@Xrayez Yes, uint32 and uint64. Considering the relatively huge amount of people asking how to manipulate image data,
it's kind of odd that the two types, used for exactly that, aren't supported. And while signed bytes seem unnecessary, they're usefull for -1/+1 switches and some other neat tricks I can't come up with right now. (slurps morning coffee)

Godot 4.0 has PackedInt32Array and PackedInt64Array (and PackedFloat32Array/PackedFloat64Array), as this distinction makes sense for performance-critical code (in addition to memory usage for large arrays). However, for scalar types, this distinction is unlikely to make a meaningful performance difference.

The issue with unsigned types is that they can work against the user, especially for people who are not prepared to deal with them. This is why most high-level languages don't expose unsigned types.

Also consider #2993 for PackedByteArray usability improvements.

@YuriSizov
Copy link
Contributor

Thank you for sharing your feelings.

It's just a figure of speech, but thank you for caring.

Of course I had a look at how Godot can use native code, IPC, etc. I was disappointed, actually, because there's no shared memory/memory mapping. GDNative is overkill. Give me a shared memory buffer, let me execute an external process, give me something to tell Godot "it's done". Bonus points would be the ability to actually pin threads to cores, because especially (or specifically, but it's generally useful) on Windows this needs manual control for efficiency.

If you want to manage memory more efficiently, or do low-level optimization, you kind of have to go low-level on a per-project basis. We don't want to have user-scope scripting API that is too confusing and complex to use. Because those who can go complex have other options, and those who can't — don't. If your proposed solution can be reasonably integrated into the API, then it can be considered, but you still need to show understanding of either a project-specific problem that is representative of many projects, or an engine-architecture limitation that is tying our hands maintaining and developing it.

If you feel like you've done that, then by all means ignore my comments and don't change anything. Consider them first-line support to make sure you get your proposal there because this has to go through reduz, and reduz can just come here and tell you "Nobody sanely does that, so we don't actually need it" if that's what he thinks. So it's in your interest to build a strong case.

@SolsticeProjekt
Copy link
Author

SolsticeProjekt commented Feb 6, 2022

Are the two of us now talking about my proposal, or about what you suggested?

Regarding to my reaction to your proposal, do I not understand how something like this would make the whole API more complicated. How is that one thing connecting to all of the rest?

90% of this happens under the hood anyway and it's not actually a lot of code. Calling a process is simple,
mapping memory is simple, even sandboxing a child process is actually really simple to do, even on Windows.
I've tried! Of course, I didn't do browser-level efforts of sandboxing, like in Chrome, but we're not talking about a browser here.

It can be as simple as "CallCode(process, PackedByteArray, StatusVariable)", where StatusVariable is a pointer to
a Godot variable for the external code to report to, and PackedByteArray is the pointer to a shared memory map.

(I can also really just work with sockets on localhost. That's not as good, but still good,
but then I'd ask to make sure zero-copy is being supported. But that's not what this proposal is about, at all.)
_

Worries about people writing beyond the buffer can easily be dealt with using a Guard Page. Godot is generally lacking
memory mapping features for some reason, despite the obvious benefits, like actual ring buffers.

As far as I can tell, all platforms support some memory mapping, but don't quote me on that one.
That being said, I'm not sure if Godot aims at full feature equality among all platforms.
Seems unrealistic to me, but more power to anyone who can pull it off without sacrificing too much.

Anyhow, this seems rather off-topic to me.

I can make a separate proposal about this, if you want me to?
There's definitely better solutions to executing native code than forcing people to understand C,
and everything that comes with it.

@TheYellowArchitect
Copy link

I cannot practically understand the benefits of this PR (square root a packed int array? ring buffers?)
Sounds proper for a GDExtension exclusive to the project, definitely not for GDScript if it means new parameters for all encoding/decoding functions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants