Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize basic math types and functions (part 1) #368

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

dimitry-ishenko
Copy link
Contributor

@dimitry-ishenko dimitry-ishenko commented Jul 14, 2023

This is a first in a series of PRs meant clean up and centralize basic math constants, vectors and matrices. Currently these are scattered all over the project and many of them have overlapping functionality.

For example, there are VECTOR3 and VECTOR4 types defined in the SDK (the OrbiterAPI.h file). Then there are Vector and Vector4 types defined in the Orbiter (the Vecmat.h file). There are also FVECTOR3 and FVECTOR4 which are equivalents of VECTOR3 and VECTOR4 but with the float data type. Each one of these classes overloads their own set of operators and defines their own versions of different functions. This results in messy and duplicate code.

Similar situation exists for the matrices, and simple math constants eg, 𝜋, etc.)

The plan is to unite all these types and define a set of operator and function templates that are uniform and are shared by all these types. The steps are:

  • Define operator and function templates for vectors.
  • Centralize VECTOR4 type.
  • Centralize VECTOR3 type.

Continued in part 2.

@dimitry-ishenko dimitry-ishenko marked this pull request as ready for review July 14, 2023 16:31
@dimitry-ishenko dimitry-ishenko changed the title Centralize basic math types and functions Centralize basic math types and functions (part 1) Jul 14, 2023
@TheGondos
Copy link
Contributor

Nice idea to factorize similar code. Does this break the ABI? I think we need API/ABI changes tags for the PRs so we can document that (or maybe an "orbiter-next" branch to pull against for breaking changes?).
While we're at it, maybe we can replace the homegrown vector code with an existing library? (I've used glm in the past, but it's OpenGL oriented so maybe not the best choice).
I see some to_XXX() functions, any reason for not using cast operators?
I'm not that enthusiastic about mass replacing the typedefs with using statements (but I can live with it)

@dimitry-ishenko
Copy link
Contributor Author

dimitry-ishenko commented Jul 16, 2023

@TheGondos thanks for your review.

Does this break the ABI?

No. At least I am trying really hard not to. So far my goal was to keep the ABI intact as much as possible to keep the existing add-ons working. That's why you will see in my other PRs some vector types were kept as unions while others are structss. Some of the types were also kept inside the oapi namespace, while others are in the global namespace. All to keep function signatures the same.

While we're at it, maybe we can replace the homegrown vector code with an existing library?

I've briefly looked into it and eigen as well. The biggest obstacle is breaking the existing ABI. Plus, we will only be using a small fraction of it.

I see some to_XXX() functions, any reason for not using cast operators?

That's a temporary "plug". In later PRs I've added morph_to functions to convert between vector types. I wanted to keep it similar to the static_cast operator. I purposely didn't want implicit conversions to prevent type abuse. In the current code there are many instances of vectors being used as colors with implicit conversions to/from colors. It's just a mess.

I'm not that enthusiastic about mass replacing the typedefs with using statements (but I can live with it)

typedef syntax is very confusing for pointers, and especially for function pointers, which is why they've added the using keyword. It's the same thing, but more readable:

typedef bool (*Listentry_clbk)(char *name, DWORD idx, DWORD flag, void *usrdata);

vs

using Listentry_clbk = bool (*)(char *name, DWORD idx, DWORD flag, void *usrdata);

@TheGondos
Copy link
Contributor

I've briefly looked into it and eigen as well. The biggest obstacle is breaking the existing ABI. Plus, we will only be using a small fraction of it.

I agree

That's a temporary "plug". In later PRs I've added morph_to functions to convert between vector types. I wanted to keep it similar to the static_cast operator. I purposely didn't want implicit conversions to prevent type abuse. In the current code there are many instances of vectors being used as colors with implicit conversions to/from colors. It's just a mess.

You can use an explicit identifier to prevent implicit conversion

typedef syntax is very confusing for pointers, and especially for function pointers, which is why they've added the using keyword. It's the same thing, but more readable:

YMMV;-) it was primarily introduced for templates but like I said, I can live with it

@jarmonik
Copy link
Contributor

typedef syntax is very confusing for pointers, and especially for function pointers, which is why they've added the using keyword. It's the same thing, but more readable:

typedef bool (*Listentry_clbk)(char *name, DWORD idx, DWORD flag, void *usrdata);

vs

using Listentry_clbk = bool (*)(char *name, DWORD idx, DWORD flag, void *usrdata);

I kinda disagree, it's only a matter of taste. I have reviewed some of the changes and I fail to see the point in a lot of modifications. As an example vector length function has been called 'length()' about 20 years already and now you have changed it to 'len()' from the whole project. The same applies to 'dotp()' and 'normalise()' functions at least. Vector type casting has been changed.
FVECTOR4.rgb, FVECTOR4.xyz and FVECTOR4.a no longer exists. Many of these "casts" were supposed to be compatible with HLSL and now it's gone. Is there actual reason for this or is it just because you didn't like it ? Might be good idea to discuss about your plans before turning things upside down.

@dimitry-ishenko
Copy link
Contributor Author

dimitry-ishenko commented Jul 19, 2023

I have reviewed some of the changes and I fail to see the point in a lot of modifications.

Hi @jarmonik. Good to have you back and thanks for your review. Let me explain.

The basic point is to clean up and modernize the code in the hopes that it will attract more developers. Over the course of however many years the project has grown, but very little, if any maintenance and refactoring has been done. There is duplicate code. There is dead code. There is home-grown and windows-specific code that can be replaced with STL.

As an example vector length function has been called 'length()' about 20 years already and now you have changed it to 'len()' from the whole project.

I agree with this. I went the "pythonic" way and shaved off a few letters. But, you are right HLSL/OpenGL uses length() and if it rubs you the wrong way, we can change it to length(). I've no problem with that.

The same applies to 'dotp()'

What happened here is that initially OrbiterAPI.h defined dotp() for the VECTOR3 class :

inline double dotp (const VECTOR3 &a, const VECTOR3 &b);

Then in commit dc3f8c5 you have added dot() for FVECTOR2, FVECTOR3 and FVECTOR4:

inline float dot(const FVECTOR2& v, const FVECTOR2& w);
inline float dot(const FVECTOR3& v, const FVECTOR3& w);
inline float dot(const FVECTOR4& v, const FVECTOR4& w);

So, when adding my function/operator templates for all the vector classes, I've decided to stick with dot(), since this is also what HLSL/OpenGL calls it.

Same thing with crossp() and cross() (which you've also added).

and 'normalise()' functions at least.

Similar scenario, but a bit more messed up. Initally Orbiter had normalise() and unit() in OrbiterAPI.h:

inline void normalise (VECTOR3 &a);

inline VECTOR3 unit (const VECTOR3 &a);

The normalise() function normalized its vector parameter in-place and unit() returned normalized vector without modifying the original. Later, you've introduced a normalize() in DrawAPI:

inline FVECTOR3 normalize(const FVECTOR3& v);

The messed up part is that it does the same thing as unit() but is spelled almost identical to normalise(). So, if some poor addon developer wants to normalize their vector in-place, but accidentally writes normalize(v); instead of normalise(v); ... Bugs, here we come. 😜

So, for the good of all mankind, I've decided to get rid of normalise() and normalize() and only keep unit(). After all, v = unit(v); is equivalent to normalise(v);.

Now, if we want to be consistent with HLSL/OpenGL, we can rename it to normalize(). I have no problem with that. It's your call.

Vector type casting has been changed. FVECTOR4.rgb, FVECTOR4.xyz and FVECTOR4.a no longer exists. Is there actual reason for this or is it just because you didn't like it ?

Because they are illegal in C++. I know it is very convenient, but accessing multiple members of a union at the same time is a UB in C++. According to §11.5.1 [class.union.general] section 2 of the standard:

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended. At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

If you noticed, I'v even removed the data member from the vector classes and added an "ugly" trick in the index operator[], which luckily gets optimized out by the compiler.

We can try to come up with a better solution, but I think what will really help is if we start using COLOUR3 and COLOUR4 classes for colors, and leave vector classes for other things. Unfortunately, this will break the ABI in some cases.

I hope this makes more sense now. And, please don't take any of this as a blame on you. 😅 I know how these things go sometimes, and which is why I've decided to do some clean-ups before I start contributing in other ways.

@jarmonik
Copy link
Contributor

The basic point is to clean up and modernize the code in the hopes that it will attract more developers.

Or shove off the existing ones.

So, when adding my function/operator templates for all the vector classes, I've decided to stick with dot(), since this is also what HLSL/OpenGL calls it.

It's not really your call to make. The next Orbiter is scheduled to launch with the DX7 engine in place (x86) build. I don't like the idea of breaking the API compatibility there just over a few naming things but it's Martin's call. After that when switching to x64 there is more room to do API cleanup since the add-ons are likely to need code level changes anyway. There is slightly different function naming between VECTOR3 and FVECTOR3 but didn't really consider it that much of an problem.

Because they are illegal in C++. I know it is very convenient, but accessing multiple members of a union at the same time is a UB in C++. According to §11.5.1 [class.union.general] section 2 of the standard:

Can't comment on that matter other than it looks solid and seems to work fine.

We can try to come up with a better solution, but I think what will really help is if we start using COLOUR3 and COLOUR4 classes for colors, and leave vector classes for other things. Unfortunately, this will break the ABI in some cases.

The plan was very much the opposite of that besides the API interface of course. Many COLOUR4 have been changed to FVECTOR4 that why it has xyzw and rgba, it's designed to deal with both. Also the plan is to stop using DX specific data types such as D3DXCOLOUR4, D3DXVECTOR4 and functions like D3DXMatrixMultiply() and rely more on FVECTOR4 datatype. I have looked at Intel's SIMD (SSE2) math library and the plan was to make FVECTOR4 to use it. This is a part of a long term porting plan to Vulkan.

@dimitry-ishenko
Copy link
Contributor Author

Or shove off the existing ones.

Who exactly did I shove off?

I don't like the idea of breaking the API compatibility there just over a few naming things.

As I said earlier, you are the one who introduced new function names into the API, not me. If you are so stuck on using dotp(), we can re-add it.

There is slightly different function naming between VECTOR3 and FVECTOR3 but didn't really consider it that much of an problem.

For you maybe not, but for other people trying to contribute, it is.

Can't comment on that matter other than it looks solid and seems to work fine.

It works today, but if it breaks tomorrow because you've updated your compiler, you will be blaming the compiler of course. There is a standard for a reason.

Many COLOUR4 have been changed to FVECTOR4 that why it has xyzw and rgba, it's designed to deal with both.

That's plain wrong, because you are shoving two two different concepts in one class. But even then, there are other ways of handling that without relying on UB.

I have looked at Intel's SIMD (SSE2) math library and the plan was to make FVECTOR4 to use it.

Most compilers are smart enough nowadays to use SIMD under the hood. This is not something you need to worry about.

You seem to be hung up on a few changes that you don't like and completely ignoring the improvements I've made. Do you not really see any value in these PRs?

@jarmonik
Copy link
Contributor

It works today, but if it breaks tomorrow because you've updated your compiler, you will be blaming the compiler of course. There is a standard for a reason.

If there is a technical problem (not a matter of taste issue) with the FVECTOR4 implementation then it needs to be fixed of course. But so far I have found no issue there. If there is then it wouldn't be just limited to FVECTOR4. I don't have time to start digging into it right now.

(https://eel.is/c++draft/class.union.general#note-1): One special guarantee is made in order to simplify the use of unions: If a standard-layout union contains several standard-layout structs that share a common initial sequence ([class.mem]), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members; see [class.mem].
— end note]`

That's plain wrong, because you are shoving two two different concepts in one class. But even then, there are other ways of handling that without relying on UB.

Of course it can't relay in undefined behavior but so far I am not aware of any.

I don't see the difference, a color can passed through the same math operations like any other vector. Color is just a vector. We don't have separate data types for positions, normals, torques, etc.. and we don't need them. So, why would a color need. But if developers decide that separate type is required for a color then it's fine by me but we need color matrix multiply and other typical vector functions for it.

Most compilers are smart enough nowadays to use SIMD under the hood. This is not something you need to worry about.

Yeah, that what I thought too but after studying assembly code it didn't use SIMD at-least not properly. If it could do that then it's one problem off my shoulders.

@dimitry-ishenko
Copy link
Contributor Author

dimitry-ishenko commented Jul 20, 2023

If there is a technical problem (not a matter of taste issue) with the FVECTOR4 implementation then it needs to be fixed of course. But so far I have found no issue there. If there is then it wouldn't be just limited to FVECTOR4. I don't have time to start digging into it right now.

It's not a matter of taste. I've actually liked what you did at first, until I've realized it was Undefined Behavior.

I'm not sure you've understood, but if you initialize FVECTOR4 through one member of the union, you cannot now use other members to access those same values. That's UB. You can only use one member of a union during its lifetime. Please reread the section I quoted.

In other words: as defined right now, the FVECTOR4 union has the following members:

float data[4];
struct { float x, y, z, w; };
struct { float r, g, b, a; };
FVECTOR3 xyz;
FVECTOR3 rgb;

So, if you've initialized the vector through the data[] member, you can't now use x/y/z/w or r/g/b/a or xyz or rgb. If you've initialized it through x/y/z/w, you can't now use data[] or r/g/b/a or xyz or rgb. And so on. That's UB.

As of right now, the Orbiter code is full of UB. Yes it works with MSVC, Clang and gcc today, but it may not work tomorrow. Or it may not work with another compiler or on another platform. That's what I meant by relying on UB.

Of course it can't relay in undefined behavior but so far I am not aware of any.

See above.

I don't see the difference, a color can passed through the same math operations like any other vector. Color is just a vector.

From mathematical standpoint, I don't disagree with you. But would it make sense to assign a position vector to a color vector? Of course not. If you have separate structs for position vectors and color vectors, the compiler will detect these mistakes for you at compile time, instead of you wasting hours of debugging to find them. That's the beauty of strongly typed languages like C++.

We don't have separate data types for positions, normals, torques, etc.. and we don't need them.

Maybe we should... It would probably keep people from making stupid mistakes, like assigning acceleration vector to a position vector. But at this point it would mean a complete rewrite of Orbiter. So, yeah we probably don't need them.

Now, if you want to use FVECTOR4 for colors, we can add a reference type, eg struct COLOUR4Ref { float &r, &g, &b, &a; } and add appropriate free functions and then do this:

FVECTOR4 v1{1, 2, 3, 4}, v2{5, 6, 7, 8};
rgb(v1) = rgb(v2); // would only assign the first 3 elements
rgb(v1).r = 9;

This would keep the syntax close to what it is right now, but without the UB.

What do you think?

@jarmonik
Copy link
Contributor

jarmonik commented Jul 20, 2023

In other words: as defined right now, the FVECTOR4 union has the following members:

float data[4];
struct { float x, y, z, w; };
struct { float r, g, b, a; };
FVECTOR3 xyz;
FVECTOR3 rgb;

So, if you've initialized the vector through the data[] member, you can't now use x/y/z/w or r/g/b/a or xyz or rgb. If you've initialized it through x/y/z/w, you can't now use data[] or r/g/b/a or xyz or rgb. And so on. That's UB.

Oh my god, if that's true then it makes unions next to useless. That kinda undermines the sole purpose of them. So, no more unions just to stay on a safe side. That document you linked is some kind of draft, is any of that actually been approved ? Standard is something that people should be able to rely, but now standard became UB.

Of course, any line of code can become UB (undefined behavior) at any time when someone gets an idea to change the syntax or underlying behavior in a language.

Now, if you want to use FVECTOR4 for colors, we can add a reference type, eg struct COLOUR4Ref { float &r, &g, &b, &a; } and add appropriate free functions and then do this:

FVECTOR4 v1{1, 2, 3, 4}, v2{5, 6, 7, 8};
rgb(v1) = rgb(v2); // would only assign the first 3 elements
rgb(v1).r = 9;

This would keep the syntax close to what it is right now, but without the UB.

What do you think?

If unions are off the table then, we have to find some other way to get easy to use and elegant syntax. Which would likely mean separate color and vector types. I don't know what you mean exactly with the above example but probably something like that, Yes.

@dimitry-ishenko
Copy link
Contributor Author

dimitry-ishenko commented Jul 20, 2023

Oh my god, if that's true then it makes unions next to useless.

You are absolutely right. Unions have been useless for many years. They've been designed to solve one problem -- space constraints. But, they've been abused for many years to solve another problem -- type punning.

That document you linked is some kind of draft, is any of that actually been approved?

That's a current draft, but that particular section has been in the standard for many years. At least since C++11. It's hard to get your hands on the actual approved version -- it costs money, but you can get the latest draft before it got approved for free (and it's very close to the final version).

Follow this link: https://en.cppreference.com/w/cpp/links

For example, §9.5 [class.union] section 1 of the C++11 standard says basically the same thing:

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time...

In order to keep the ABI from breaking we can stick with vectors for now. What I was trying to say with the above example is that instead of having rgb members in the vector class, we can have a function that takes a vector by reference and returns a "proxy" struct with rgb members that are references to the original vector:

struct RGBRef { float &r, &g, &b; };
struct RGBARef { float &r, &g, &b, &a; };

constexpr auto rgb(FVECTOR4& v) { return RGBRef{v.x, v.y, v.z}; }
constexpr auto rgba(FVECTOR4& v) { return RGBARef{v.x, v.y, v.z, v.w}; }

// then you can do this:

FVECTOR4 v1{0, 1, 2, 3};
auto c1 = rgb(v1);

c1.r = 123;
c1.g = 456;
c1.b = 789;

assert(v1.x == c1.r && v1.y == c1.g && v1.z == c1.b);

@TheGondos
Copy link
Contributor

Please don't make up complex solutions to problems that do not exist. We already have the standard committee for that...
Per chapter 11.5.1 sited above, I'm not convinced this is UB :

If a standard-layout union contains several standard-layout structs that share a common initial sequence ([class.mem]), and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members

Maybe there is an issue with the 4th float since the standard talks about "the common initial sequence". This can probably be solved with a dummy float.
I'm tempted to say, what the hell, even if UB in C++ it is correct behaviour in C99 and we'll all be dead before an implementation makes this code not working. At the end of the day, what does it get you? Will it improve portability ? I don't think so. Will it help refactor the code? Probably not. There are probably bigger fish to fry at the moment.
Blindly following guidelines will probably gather more C++ zealot hairsplitters than motivated programmers. C++ is a tool to use, not a religion to follow.
That being said, I do think the idea of merging the different vector types is a good one.
For the methods name, we should keep the names and semantics that are already in OrbiterAPI.h since they are part of the SDK and potentially impact modules. The side effects inside the core is under our control so it should not be that big of a deal.

@dimitry-ishenko
Copy link
Contributor Author

dimitry-ishenko commented Jul 20, 2023

inspect != assign. Yes, you can inspect only, and only if you have a standard-layout union. Are these standard-layout unions? No they are not. The compiler can layout them however it decides fit. So your point is moot, which is exactly why I didn't include this note.

There are probably bigger fish to fry at the moment.

I am not asking you to do anything. I've done all the legwork. The only difference for you is that if you write new code in the future you have to follow slightly different syntax. You can focus on the bigger fish like you have been.

Blindly following guidelines will probably gather more C++ zealot hairsplitters than motivated programmers.

Following the standard has nothing to do with being a zealot. It's like following laws in real life. You follow them because they exist, not because you are a zealot.

@TheGondos
Copy link
Contributor

TheGondos commented Jul 20, 2023

I beg to differ : godbolt
They do have standard layout (the struct, float array and the union itself). And of course you can assign too, the latest union type assigned becomes the active one.
Edit : as for guidelines, I'll let Oscar Cornut from ImGui speak my mind

@dimitry-ishenko
Copy link
Contributor Author

dimitry-ishenko commented Jul 20, 2023

I beg to differ

OK you are correct, but it's irrelevant.

And of course you can assign too, the latest union type assigned becomes the active one.

No it doesn't. It's been made clear in the C++17 version. Once the member becomes active, it's active for the lifetime of the union:

In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended. At most one of the non-static data members of an object of union type can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time.

So, you are only allowed to inspect the initial sequence. And even that can only be done under certain conditions:

If a standard-layout union contains several standard-layout structs that share a common initial sequence, and if a non-static data member of an object of this standard-layout union type is active and is one of the standard-layout structs, it is permitted to inspect the common initial sequence of any of the standard-layout struct members

So, if you initialize it through the data[] member (which is not a struct), you can't even inspect x/y/z/w or r/g/b/a. And vice versa.

And that's why I'm proposing to stay away from the unions (other than the name, which we must keep for ABI compatibility). There are plenty of other ways to skin the cat. 😅 Again, I am not asking anyone to do any work -- I've already done it, and I am willing to amend it so that everyone is happy. Including Bjarne. 😉

@TheGondos
Copy link
Contributor

This is exactly the kind of hairsplitting I was afraid of. I've said my piece, I won't spend further energy on this.

@jarmonik
Copy link
Contributor

We are not making (compatibility breaking) API changes until the next Orbiter is out and the work for x64 build will begin. There will be API cleanup and if there is a problem with unions then we get a rid of them. (There's no acute issue that would require immedient response.) If the C++ can't handle simple unions anymore then what else is about to fall apart ?

But anyway, my priority is to get the Orbiter in a release condition.

HOLD MERGE until cleared. Need to review the changes in detail and run some compatibility tests and likely make some modifications.

@jarmonik jarmonik mentioned this pull request Jul 26, 2023
@jarmonik jarmonik marked this pull request as draft July 26, 2023 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants