title | toc |
---|---|
Informal Pseudocode Specification |
true |
The language we use is a variant of Donkey's, which, in turn, is a well-defined subset of IB Pseudocode.
- Tokens are separated via operators or whitespace. Unnecessary whitespace is ignored. In this context, whitespaces are tabs (U+0009) or spaces (U+0020).
- Statements are separated by line feeds (U+000A).
- On this page, regular expressions are POSIX extended regular expressions. (We might switch to ABNF in the future.)
- Block comments begin with
/*
and end with*/
. - Inline comments begin with
//
until a line feed. - Comment characters, as specified above, are ignored in string literals.
Since IB Pseudocode is dynamically typed, all values are basically a tagged
union. See ibpctypes.h
for details.
Variables are not declared; they are automatically "declared" when assigned to.
All variables should be capitalized, but this is not enforced.
All simple types are passed by value (even strings).
Type | Description | Implementation | Literal form |
---|---|---|---|
Integer | A whole number | long long |
-?[0-9]+ in base 10 |
Real | A floating point number | double |
-?[0-9]+\.[0-9]+ |
String | An array of bytes | a struct with a cap, a size, and char *data |
"[^"\n]*" |
Boolean | A boolean | bool |
(true|false) |
Null | No meaningful value | NULL |
null |
Note that:
- There are limits to the size of integers and the precision of reals.
We could have used GNU MP or similar libraries to
support arbitrary precision integers and reals, but that's a bit too much of a
hassle due to the amount of temporary variables needed to make GMP arithmetic
functions work. Patches welcome, though!
long long
isint64_t
on many systems.double
is IEEE 754 binary64 on virtually all systems.
All compound types are passed by reference.
Type | Description | Implementation | Literal form |
---|---|---|---|
List | An ordered container for a group of values | a linked list | \[\v\s*(,\s*\v\s*)*\] where \v is any value |
Stack | A FILO | ||
Queue | A FIFO |
Stacks and queues are a distant target to implement.
int(X)
convertsX
to an integer.- If
X
is a boolean,0
or1
is returned. - If
X
is an integer, it is returned. - If
X
is a real, it is floored and an integer is returned. - If
X
is a string,strtoll
is used to convert the string to an integer. Base-10 is assumed.
- If
real(X)
convertsX
to a real.- If
X
is a boolean,0.0
or1.0
is returned. - If
X
is an integer, it is cast to a real and returned. - If
X
is a real, it is returned. - If
X
is a string,strtod
is used to convert it into a real. Base-10 is assumed.
- If
str(X)
convertsX
to a string.- If
X
is a boolean,"false"
or"true"
is returned. - If
X
is an integer or real, it is converted to a string withsprintf
. - If
X
is a list, the list is traversed and each item is recursively converted to a string, and the result is concatenated with,
between each item,{
is prepended, and}
is appended. Specially, any items that are strings are wrapped in"
. - If
X
is a string, it is returned.
- If
input X
reads one line of standard input, terminated by a line feed, and stores the results as a string inX
.X
may or may not be previously defined.output
could take one or more arguments:output X
causes X to be converted to a string and written to standard output, followed by a line feed.output \v\s*(,\s*\v\s*)*
, where\v
represents any value, causes all of the arguments to be converted to strings, concatenated, and written to standard output, followed by a line feed.
A + B
evaluates to the sum ofA
andB
, whereA
andB
are integers or reals.- If both are of the same type, their original type is returned.
- If one is an integer but another is a real, the return type is a real.
A - B
evaluates to difference betweenA
andB
by subtractingB
fromA
, whereA
andB
are integers or reals.- If both are of the same type, their original type is returned.
- If one is an integer but another is a real, the return type is a real.
- This may also be used as an unary operator that simply negates B.
A * B
evaluates to the product ofA
andB
, whereA
andB
are integers or reals.- If both are of the same type, their original type is returned.
- If one is an integer but another is a real, the return type is a real.
A / B
evaluates to the quotient ofA
divided byB
, whereA
andB
are integers or reals. A real is returned.A div B
evaluates to the quotient ofA
divided byB
, whereA
andB
are integers or reals. A floored integer is returned.A mod B
orA % B
evaluates to the modulus ofA
divided byB
, whereA
andB
are integers or reals. An integer is returned.
The behavior of division functions (/
, div
, and mod
) is undefined when the
divisor is zero. (Generally speaking, they cause SIGFPE on amd64 and evaluate to
0 on aarch64, though the underlying C compiler may have optimizations that
change this. Divison by zero is not recommended as doing so might create
nasal demons.)
All comparison operators evaluate to booleans that describe whether the corresponding condition is true or false.
Form | Description |
---|---|
A == B |
A is equal to B |
A != B |
A is not equal to B |
A < B |
A is strictly less than B |
A <= B |
A is less than or equal to B |
A > B |
A is strictly greater than B |
A >= B |
A is greater than or equal to B |
==
and!=
work on simple types.<
,<=
,>
, and>=
only work on integers and reals. Integers and reals may be compared with each other.
Form | Description |
---|---|
A and B |
Both A and B are true |
A or B |
A and/or B is true |
not X |
X is not true |
!X |
Same as not X , but with higher precedence |
The smaller the list item index. the higher the precedence:
!
,-
(negation)*
,div
,mod
,/
,%
+
,-
(subtraction)==
,!=
,>=
,<=
,>
,<
not
and
or
Parentheses (
and )
may be used to group an expression.
Functions are defined in the following form:
func function_name(PARAM_0, PARAM_1, ..., PARAM_N)
...
end function_name
Note that in the original IB specifications, functions are defined without the
func
keyword. This is ambiguous, as there is no way to distinguish between
function calls and function definitions, due to the lack of semicolons,
curly braces, or meaningful whitespace. The func
keyword used as a workaround.
If a return
keyword is encountered during the execution of the
function, the function terminates, control is passed back to the caller, and the
function evaluates to argument passed to return
. A return
statement is
illegal outside functions. Having multiple arguments passed to it is also
illegal.
If the end of a function is reached without encountering a return
statement,
the function evaluates to null
.
A function can be invokved as function_name(PARAM_0, PARAM_1, ..., PARAM_N)
.
If the parameters are function calls, they are evaluated sequentially, from the 0th to the Nth parameter.
The if
statement may be used to conditionally execute code based on a runtime
boolean value.
In the following example, ...
runs iff CONDITIONAL_VALUE
is true
.
CONDITIONAL_VALUE
must be a boolean.
if CONDITIONAL_VALUE then
...
end if
An else
branch may be added, where the contents are run iff
CONDITIONAL_VALUE
is false
.
if CONDITIONAL_VALUE then
... // executed if true
else
... // executed if false
end if
Statements may be nested, and only one end if
is needed:
if CONDITIONAL_VALUE then
...
else if ANOTHER_CONDITION then
...
else
...
end if
The last else
branch is optional.
The following causes CONDITION to be evaluated. If it is true, the body of the
loop ...
is executed. If it is false, the loop is skipped. If the loop is
being executed and the end of the loop content has been reached, the condition
is checked again and the loop repeats if the condition is true.
loop while CONDITION
...
end loop
When written in goto
s in C:
begin_loop:
if (!CONDITION)
goto end_loop;
...
goto begin_loop;
end_loop:
The following causes CONDITION to be evaluated. If it is false, the body of the
loop ...
is executed. If it is true, the loop is skipped. If the loop is
being executed and the end of the loop content has been reached, the condition
is checked again and the loop repeats if the condition is false.
loop until CONDITION
...
end loop
This is equivalent to:
loop while (!CONDITION)
...
end loop
The IB Pseudocode for
loop can only loop a variable from one value to another,
with 1 as the step. Both the beginning and the end are inclusive.
loop I from BEGINNING to END
...
end loop
This is equivalent to the following C:
for (long long I = BEGINNING; I < END + 1; ++i) {
...
}
The break
statement causes the inner-most loop to exit immediately, skipping
the rest of the current iteration and all further iterations (that is, unless if
the loop itself is entered again).
The continue
statement jumps to the end of the current iteration.
These statements work on both while
and for
loops.
Memory is usually allocated automatically with malloc()
. Stack variables are
rarely used.
Currently, free()
is never called, and memory is always leaked. Inline C or
inline assmely may be used to free memory when necessary. The usual
considerations in use-after-free and double-free apply; it is recommended to run
with MALLOC_OPTIONS
set to CFGJUX
on OpenBSD; see
malloc(3).
I might add a garbage collector in the future. I will not add automatic reference counting or borrow checking.
Each line, where the first non-whitespace character sequence is $
, is
printed verbatim into the output C code, with everything up to and including the
first $
removed.
Each line, where the first non-whitespace character sequence is ~
, is
treated as inline assembly.
All identifiers written in the pseudocode language may be accessed in C by
prefixing them with ibpc_
.
The behavior of using threading, any form of async, signal handlers, or freeing memory, is undefined.
On OpenBSD, you should use inline C to invoke pledge(2) and unveil(2) to harden the program. The usual methods apply, such as:
$#ifdef __OpenBSD__
$error = unveil("file_to_read.txt", "r");
$if (error) err(1, "unveil");
$error = unveil(NULL, NULL);
$if (error) err(1, "unveil");
$error = pledge("stdio", NULL);
$if (error) err(1, "pledge");
$#endif