Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft terminology related to API behavior definitions #20959

Closed
wants to merge 18 commits into from
Closed
Changes from 5 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
334 changes: 334 additions & 0 deletions doc/api18970.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,334 @@
# Draft material to review in support of [#18970][]

[//]: # (cmark-gfm doc/api18970.md > /tmp/api18970.html)

[#18970]: https://github.com/zephyrproject-rtos/zephyr/issues/18970

## Background

This section summarizes existing and relevant Zephyr concepts,
introducing clarifications where necessary. See also the Zephyr
[glossary][].

[glossary]: https://docs.zephyrproject.org/latest/glossary.html

### Thread Terminology
mbolivar marked this conversation as resolved.
Show resolved Hide resolved

Thread behavior in Zephyr is documented in three locations which in
aggregate are not entirely consistent:
* [Threads](https://docs.zephyrproject.org/latest/reference/kernel/threads/index.html)
* [Scheduling](https://docs.zephyrproject.org/latest/reference/kernel/scheduling/index.html)
* The [Interrupt](https://docs.zephyrproject.org/latest/guides/porting/arch.html#interrupt-and-exception-handling)
and [Context Switching](https://docs.zephyrproject.org/latest/guides/porting/arch.html#thread-context-switching)
sections of the [Architecture Porting Guide](https://docs.zephyrproject.org/latest/guides/porting/arch.html)

Zephyr defines [six thread
states](https://docs.zephyrproject.org/latest/reference/kernel/threads/index.html#thread-states)
of which four are active:

* **Ready** when there is nothing that prevents the thread from becoming
active as the current thread, but it is not the current thread.
* **Running** when the thread is active as the current thread (on a
processor).
* **Waiting** when the thread is on a queue waiting for an event to occur
that will transition it to *Ready*.
* **Suspended** when the thread is inactive and must be transitioned to
*Ready* explicitly (e.g. via `k_thread_resume()`.

A thread is made **unready** if it transitions to *Suspended*,
*Waiting*, or *Terminated*.

Zephyr defines [two mechanisms for selecting the running
thread](https://docs.zephyrproject.org/latest/reference/kernel/scheduling/index.html#scheduling):
* In **cooperative** scheduling a thread transitions from *Running* only
when it explicitly invokes an operation that makes it unready
(i.e. *Waiting* or *Suspended*) or invokes `k_yield()`.
pabigot marked this conversation as resolved.
Show resolved Hide resolved
* In **preemptive** scheduling a thread may be involuntarily
transitioned from *Running* to *Ready* based on a change in
conditions, such as elapsed runtime or transition of a higher-priority
thread to a *Ready* state.

A **reschedule point** is any point where the kernel calls the internal
logic that selects the next thread to run. Reschedule points include:
* Return from an ISR to thread execution;
* Invoking `k_yield()` or any other action that makes the current thread
unready;
* Most (all?) actions that cause a thread to become *Ready*.

That invoking a function causes a reschedule point does not guarantee a
context switch, it simply provides an opportunity. Whether a context
switch occurs depends on the class and priority of the current thread
and the thread at the head of the ready queue.

Zephyr defines [two core classes of
threads](https://docs.zephyrproject.org/latest/reference/kernel/threads/index.html#thread-priorities)
with one [extension](https://docs.zephyrproject.org/latest/reference/kernel/scheduling/index.html#meta-irq-priorities):

* **preemptible** threads remain the current thread until
* a cooperative thread becomes *Ready*; or
* a higher-priority preemptible thread becomes *Ready*; or
* the thread invokes an operation that explicitly causes it to become
unready.
* **cooperative** threads remain the current thread until something
causes it to become unready or it invokes `k_yield()`.
* **meta-irq** threads are cooperative threads with a special property
that they *do* pre-empt cooperative threads with lower priorites,
*and* are not excluded when using a scheduler lock.

The class of the thread is uniquely determined by its priority: All
cooperative threads have a priority higher than any preemptible thread,
and meta-IRQ threads have a priority higher than any other cooperative thread.

Most Zephyr threads are cooperative, and some code may take advantage of
this by assuming the behavior of reschedule points reached during an
operation will be guided by the invoking thread being cooperative.

Given this, define the following terms:

* A thread **suspends** when it voluntarily invokes a function that
causes it to transition from *Running* to *Suspended*.
(`k_thread_suspend()` is such a function.)
* A thread **waits** when it voluntarily invokes a function that causes
it to transition from *Running* to *Waiting*. (`k_sleep()` is such a
function.)
* A thread **yields** when it voluntarily invokes a function that causes
it to transition from *Running* to *Ready*. (`k_yield()` is such a
function.)
* A thread is **preempted** when it is *Running* but a reschedule point
(involuntarily) transitions it to *Ready*.

A **context switch** occurs when the reschedule point selects a thread
other than the current thread to execute next.

**NOTE** The term "sleeping" when applied to Zephyr threads has
historically meant a thread that is *Waiting* for a timeout event.

For API behavior we are primarily interested in whether invoking a
function can cause a context switch from a cooperative thread.

### Context Terminology

**TODO** fill this out

This section is intended to describe the privilege and processor context
variations in which a thread can run. It should provide or reference
definitions of terms like this:

* Kernel [initialization
level](https://docs.zephyrproject.org/latest/reference/drivers/index.html#initialization-levels)
* User-space versus kernel (system call)
* Normal ("thread"?) versus interrupt (invoked from an Interrupt Service
Routine)

For API behavior we are interested in whether a particular function
**may**, **must**, or **must not** be invoked from a specific context.

### Function vs Operation

tl;dr: A function *returns* while an operation *completes*.

A **function** is an addressible sequence of actions to which control is
transferred by **invoking** the function with various parameters, and
from which control is transferred back to the point where it was invoked
when the function **returns**.

A function may succeed or fail. In many cases failure is indicated by a
negative integer return value while success is indicated by a
non-negative integer return value, but specific functions may use other
conventions.

An **operation** is behavior, which is generally initiated by invoking a
function. An operation can also succeed or fail. An operation
**completes** when the success or failure of the operation has been
determined and made available through whatever mechanism was specified
through the initiating function.

For APIs we are particularly interested in the relationship between the
function and its associated operation. Some potential relationships
include:
* The function succeeds if and only if its associated operation
completes with success. (This is usually expected of *synchronous*
functions.)
* If the function fails then its associated operation has not been
initiated. If the function succeeds then its associated operation has
been initiated. The success or failure of the operation is not
otherwise coupled to success or failure of its initiating function.
(This is usually expected of *asynchronous* functions.)

## Definitions
mbolivar marked this conversation as resolved.
Show resolved Hide resolved

### rescheduling (function)

A function is rescheduling when it invokes an operation that includes a
reschedule point (potentially, but not necessarily, context switching to
a non-current thread).

Note that whether a reschedule-point function will cause a context
pabigot marked this conversation as resolved.
Show resolved Hide resolved
switch depends on the priority of the current and ready threads.

#### Commentary

The term *rescheduling* applies only to functions that explicitly invoke
operations that (may) encounter a reschedule point. Unless interrupts
are disabled any function may be context-switched if the interrupt
causes a meta-irq thread to become ready.

### reschedule-safe (function)
pabigot marked this conversation as resolved.
Show resolved Hide resolved

A function is reschedule-safe if it is either not rescheduling, or if it
is rescheduling but no reschedule point reached during its execution
would cause a cooperative thread invoking it to be context-switched.

#### Commentary

For thread safety it is generally necessary to hold a lock of some sort.
If a sequence of code involves only invocation of functions that do not
context-switch then the lock may be avoidable.

See also discussion of the term "block" at the definition of
*synchronous*.

Note that if a function is marked as isr-callable then it
must internally check `k_is_in_isr()` and behave as a no-wait function,
whether or not it was invoked in a no-wait mode.

### context-switching (function)

A function is context-switching when it is rescheduling and is
guaranteed to cause a context switch when the head of the ready queue is
not the current thread, even for cooperative threads.

#### Commentary

Examples include `k_yield()` and `k_sleep()` (TBD: only when passed a
non-zero timeout?).

### no-wait (function)

A rescheduling function that is not reschedule-safe is no-wait when
calls to it can be forced by a parameter into a reschedule-safe path,
i.e. one that will not trigger a context switch from a cooperative
pabigot marked this conversation as resolved.
Show resolved Hide resolved
thread.

#### Commentary

A no-wait function invoked in no-wait mode in conditions where it cannot
initiate its operation should fail, in many cases by returning the value
`-EWOULDBLOCK`.

### isr-callable (function)
mbolivar marked this conversation as resolved.
Show resolved Hide resolved

A function is isr-callable if and only if (TBD: it is reschedule-safe).

### thread-safe

A function is thread-safe if its behavior is correct when invocations
from multiple threads are active simultaneously.

#### Commentary

See *rescheduling*.

Note that it may be impossible to guarantee thread safety when using
Zephyr preemptible or meta-irq threads.

### reentrant

A function is reentrant if its behavior is correct when it is invoked by
(indirect) recursion from the same thread.

### interrupt-safe

A function is interrupt-safe if its behavior is not affected by
concurrent access to shared data from interrupts.

#### Commentary

Most public API will satisfy this condition; some private API may not.

We need to be able to say succinctly "Unless otherwise specified all API
functions are interrupt-safe" and expect people to know what that means.
A specific example would be the GPIO API. Because GPIO write functions
may be invoked from ISRs read-modify-write code like:

```
u32_t out = gpio->OUT;
gpio->OUT ^= (out & ~mask) | (value & mask);
```

*must* be wrapped in a spin-lock to be interrupt-safe. Many current
implementations do not satisfy this requirement.

On the other hand internal functions may be written to assume they are
called with interrupts disabled, or a specific lock held.

**TODO** standard marking?

### atomic

An operation is atomic if the steps it makes internally cannot be
affected by nor visible to interleaving executions, such as from
interrupts or thread pre-emption.

#### Commentary

An operation that is atomic is by definition interrupt-safe.

An operation that is atomic is by definition thread-safe.

### synchronous (function) (vs asynchronous)

A function is synchronous if it will not return until the operation it
initiates has completed.

#### Commentary

Informally the term "blocking" may be used for a synchronous function,
but colloquially it may imply that the function involves placing the
invoking thread onto a wait queue (which is one way a function
suspends). It is possible to have a synchronous non-suspending
function; an example is `k_busy_wait()`.

### asynchronous (function) (vs synchronous)

A function is asynchronous if it may return before the operation it
initiates has completed. An asynchronous function will generally
provide a mechanism by which operation completion is reported, e.g. a
callback or event.

#### Commentary

Note that asynchronous is orthogonal to context-switching. Some API may
provide completion information through a callback, but may suspend while
waiting for the resource necessary to initiate the operation; an example
is `spi_transceive_async()`.

### queued (proposed, TBD)

A function is queued if it is asynchronous and allows multiple
operations to be outstanding at any time.

#### Commentary

This concept is proposed due to operations like `spi_transceive_async()`
which returns its result through a signal but will suspend if the device
is already processing an asynchronous operation.

A related capability that is non-suspendable could be implemented
through through passing a chainable persisted state object to hold the
operation parameters in a persisted state object that can be added to an
internal queue for processing when the required resource is available.
Such a theoretical API might be described as *queued*.

Since this term specifies the mechanism by which a non-suspending
asynchronous function supports multiple incomplete operations, rather
than just that behavior, it should probably be avoided.

## Other Rules

## To Do

- [ ] Consider a standard marking for private functions that must be
invoked with an held or interrupts disabled, such as a suffix
`_locked`.
- [ ] Define the terminology related to execution context