Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: Add option to disable non-lock-free atomics #1022

Closed
wtywtykk opened this issue Jan 31, 2025 · 17 comments
Closed

Request: Add option to disable non-lock-free atomics #1022

wtywtykk opened this issue Jan 31, 2025 · 17 comments

Comments

@wtywtykk
Copy link

Hi,

In etl's atomic, when the data can't be accessed with compiler's lock-free statements, a spin lock is used.
But when trying to access those atomics from ISR, the spin lock can deadlock.

Here's an example:

#include "hc32_ddl.h"
#include <etl/atomic.h>
#include <math.h>

//#define TEST_LOCK_FREE

#ifndef TEST_LOCK_FREE
#define DATA_CNT 1

struct large_struct
{
	uint32_t buf[DATA_CNT];
};

etl::atomic<large_struct> at;
#else
etl::atomic<uint32_t> at;
#endif

void SysTick_IrqHandler(void)
{
#ifndef TEST_LOCK_FREE
	large_struct new_data;
	for (auto i = 0; i < DATA_CNT; i++)
	{
		new_data.buf[i] = rand();
	}
	at.store(new_data);
#else
	at.store(rand());
#endif
}

int main(void)
{
	// Systick running at 5kHz
	SysTick_Config(SystemCoreClock / 5000);
	NVIC_SetPriority(SysTick_IRQn, 0);

	while (1)
	{
		uint32_t sum = 0;
#ifndef TEST_LOCK_FREE
		large_struct read_data = at.load();
		for (auto i = 0; i < DATA_CNT; i++)
		{
			sum += read_data.buf[i];
		}
#else
		sum += at.load();
#endif

		volatile uint32_t keep = sum;
	}
}

LEDBlink.cpp

The "store" instruction in ISR will wait for "load" in main to release the lock. However since we are in ISR, the lock is never released.

Image

@jwellbelove
Copy link
Contributor

I'll take a look at that.
You could always try using is_lock_free() to throw an assert.
I also will look at adding a compile time flag and adding constexpr to is_lock_free() so this can be used in a static_assert.

@jwellbelove
Copy link
Contributor

jwellbelove commented Jan 31, 2025

Any macro to disable non-lock-free atomics will not work for etl::atomic if the project is set to use the STL, as etl::atomic just typedefs std::atomic in that use case.

template <typename T>
using atomic = std::atomic<T>;

@jwellbelove
Copy link
Contributor

There was a property added to std::atomic with C++17 that could be added static constexpr bool is_always_lock_free.

@wtywtykk
Copy link
Author

wtywtykk commented Feb 1, 2025

I checked the STL from gcc and MSVC:
For MSVC, STL uses a spin-lock like what etl does. That makes sense on windows, since threads normally won't be interrupted.
For gcc, STL uses compiler built-ins directly. When lock-free can be achieved, gcc generates the code, and if not, it will ask user to implement one. And if it's not implemented, there will be linker error.

https://gcc.gnu.org/onlinedocs/gcc-8.4.0/gcc/_005f_005fatomic-Builtins.html

The four non-arithmetic functions (load, store, exchange, and compare_exchange) all have a generic version as well. This generic version works on any data type. It uses the lock-free built-in function if the specific data type size makes that possible; otherwise, an external call is left to be resolved at run time. This external call is the same format with the addition of a ‘size_t’ parameter inserted as the first parameter indicating the size of the object being pointed to. All objects must be the same size.

Is there any reason that etl don't use built-ins for large data types? Can we just use those built-ins? So the user can be notified by a linker error and no unexpected behavior will happen.

jwellbelove added a commit that referenced this issue Feb 1, 2025
@jwellbelove
Copy link
Contributor

jwellbelove commented Feb 4, 2025

I'm adding a static constexpr is_always_lock_free member function to etl::atomic, which has been available from C++17 for std::atomic.
You can use this in a static_assert wherever you need to ensure a lock free atomic.

@jwellbelove
Copy link
Contributor

Is there any reason that etl don't use built-ins for large data types? Can we just use those built-ins? So the user can be notified by a linker error and no unexpected behavior will happen.

Do you mean this?
If there is no pattern or mechanism to provide a lock-free instruction sequence, a call is made to an external routine with the same parameters to be resolved at run time.

@wtywtykk
Copy link
Author

wtywtykk commented Feb 6, 2025

Yes

@KammutierSpule
Copy link

This somehow relates to my previous feature request: #880

@jwellbelove
Copy link
Contributor

jwellbelove commented Feb 7, 2025

Creating a generic, non-OS, ETL mutex would need to be implemented with GCC/Clang atomic builtins.
The downside(?) of this is they would only ever work as spinlocks.

@jwellbelove
Copy link
Contributor

I've been experimenting with the GCC/Clang implementation of atomics for the ETL.

I have come up with a version that will only support types that are 'lock free'.
The requirement is that they fit within the memory size of a supported integral type.

Assuming uint64_t is available then...

  • All integers are lock free
  • float and double are lock free (long double is not supported, this is compiler/platform dependent)
  • Enumerations are lock free
  • Trivially copyable structs and classes are lock free if sizeof(T) is <= the sizeof the storage integer.
  • There is no fall-back to mutexes or spinlocks.

@jwellbelove
Copy link
Contributor

Also, compilation will fail with a static_assert if a suitable atomic cannot be defined.

@wtywtykk
Copy link
Author

Creating a generic, non-OS, ETL mutex would need to be implemented GCC/Clang atomic builtins. The downside(?) of this is they would only ever work as spinlocks.

On single core non-OS platforms we can also disable interrupt, like a critical section.

This reminds me that on Windows, spin lock raises IRQ level before compare_exchange. So ISRs are blocked when the lock is acquired. (Actually on single core CPU raising IRQ level is the only thing that spin lock does)

Assuming uint64_t is available then...

Unfortunately on Cortex M4, only uint32_t or smaller is atomic...

@jwellbelove
Copy link
Contributor

Unfortunately on Cortex M4, only uint32_t or smaller is atomic...

That would limit the max size of object that could be supported.
float would probably be available, but not double.

What do you think about the idea of not supporting non-lock-free atomics?

@jwellbelove
Copy link
Contributor

I've been reading an article about how user-space spinlocks should be avoided at all cost, due to OS thread scheduling issues.

@wtywtykk
Copy link
Author

wtywtykk commented Feb 13, 2025

What do you think about the idea of not supporting non-lock-free atomics?

For me, I only use lock free ones.
I think for a embedded project, it's nearly impossible to have a perfect solution for non-lock-free atomics. And if the user do want one, there's nothing that stops them from making their own ones.

@wtywtykk
Copy link
Author

I also give the GCC atomic a try. It works for small structures, enums. But for large structures, it has 2 problems:

First, GCC keeps complaining about

error : definition of 'void __atomic_load(unsigned int, const volatile void*, void*, int)' ambiguates built-in declaration 'void __atomic_load(unsigned int, const volatile void*, void*, int)'

I have no idea about the difference. If I move them into a C file, the error is gone. But I suspect it's just hiding the problem.

Second, if we want to make some lock for the data. We can't pass the per instance lock to the built-ins. So we have to use a global one. That's not good.

@jwellbelove
Copy link
Contributor

Fixed 20.40.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants