Skip to content

stytri/oboe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OBOE: Only Binary Operator Expressions

About

This is a toy language inspired by the C ternary operator.

It originated when I was thinking it would be nice to have an equivalent to the C ternary operator for the switch statement, this was then expanded to why not make everything an operator and eliminate keywords.

Further to this I arrived at the following guides for implementation:

  • Operators not keywords
  • Binary operators only
  • Only use Standard C
  • Use it as a sandbox for other ideas

Development Environment

Windows 7

jEdit Editor

MingGW (GCC) Compiler

CodeBlocks (to build and debug)

Git Source Control Management

FreeCommander File Manager

ConEmu Console Terminal

Implementation

Implemented as an executable syntax tree interpreter.

Lexicon

UTF-8 encoding.

Comments

Comments are started with #

If followed by one of (, [, {, the comment is terminated by the corresponding ), ], }. The active brackets can be nested.

Otherwise comments are terminated at the end of line.

Integers

Decimal

[0-9]+([Ee][0-9]+)?

Hexadecimal

(0x|0X)[0-9]+([Pp][0-9]+)?

Floats

Decimal

[0-9]+\.[0-9]*([Ee][-+]?[0-9]+)?

Hexadecimal

(0x|0X)[0-9A-Fa-f]+\.[0-9A-Fa-f]*([Pp][-+]?[0-9]+)?

Strings

Delimited by ".

Can contain escape sequences (see Characters section)

Characters

Delimited by '.

Can contain an escape sequence.

Escape sequence

Initiated by \, followed by:

  • 0 inserts a nul character.
  • n inserts a new-line character.
  • t inserts a horizontal-tab character.
  • U or u followed by up to 8 hexadecimal digits specifying a Unicode code-point.
  • W or w followed by up to 4 hexadecimal digits specifying a Unicode code-point.
  • X or x followed by up to 2 hexadecimal digits specifying a Unicode code-point.
  • End-of-Line characters; these are elided, including CR-LF and LF-CR pairings.
  • for other characters, acts as a quoting mechanism.

Sub-expressions

Initiated by one of (, [, {, terminated by the corresponding ), ], }.

( ) are elided, replaced in the syntax tree by the bracketed sub-expression.

[ ] and { } are represented in the syntax tree by distinct operators.

[ ] is used to define arrays/environments.

{ } is used to designate an evaluation block.

Where the bracketed expression is an operand-less operator sans space, then this forms a distinct operator.

Arrays

An array can be indexed, associative, or a mix of both. They can also act as an environment (aka Name-space or, scope).

Indexes are zero based. Assigning to to the last + 1 index, appends a new entry.

Environments

The following environments are predefined:

local which is the default scope within a function. It can be specifically invoked using the (:) operator. There are no predefined identifiers in the local environment.

static which is the default scope within a source file. It can be specifically invoked using the {:} operator. There are no predefined identifiers in the static environment.

global, which is available to all. It can be specifically invoked using the [:] operator. Unless oboe is invoked with the --math option, there are no predefined identifiers in the global environment.

system, which can be accessed via the sigil operator.

When an environment is applied to an expression or, expression-list, it is automatically linked to the current environment.

An anonymous environment can be utilized to limit the scope of variables.

Block

Used to demark a block of code; primarily this will be used with conditional expressions to isolate a block of code to avoid unwanted interaction with the ; operator which is utilized to designate alternate program flow paths.

Operators

Predefined operators
  • applicate, has no lexical representation, but is invoked by adjacency.
  • , sequence, creates a list of expressions.
  • ; assemblage, creates a list of sequences/expressions.
User-defined operators

See lex.h for permitted lexeme characters.

Where an operand-less operator is bracketed sans space, then this forms a distinct operator.

Named operators

User-defined operators can be named by prefixing an identifier with '`' and can also be terminated with another back-tick.

Unary operators

All operators are inherently binary; when used as a unary operator, the operator is still parsed at the same precedence level; therefore, when an operator is used as a unary operator in a sub-expression, the sub-expression should be parenthesized.

Identifiers

See lex.h for permitted lexeme characters.

Grammar

Basic Grammar

Expressions are evaluated left to right.

Precedence levels, in decreasing order, are:

  • Primary (Values, Identifiers, Sub-expressions)
  • Applicate
  • Binding
  • Exponential
  • Multiplicative
  • Additive
  • Bitwise
  • Relational
  • Logical
  • Conditional
  • Assigning
  • Declarative
  • Interstitial
  • Sequence
  • Assemblage

Although the goal is for only binary operators, the simplicity of the implementation of parsing gives us unary operators for free - it would require more code to enforce binary only. However, in the syntax tree all operators are binary, unary operations being represented by having a non-value operand (internally this is the Zen type - Zero/Empty/Null). The empty parenthesis () operator can be used to specify Zen explicitly.

The more detailed grammar (e.g. declaration, selection, iteration) is handled at runtime; but is built from binary operators.

Operator Grammar and Operation

assemblage

left-operand ; right-operand

An assemblage may be evaluated differently when used as an operand, but is otherwise evaluated thus:

left-operand is evaluated, then right-operand is evaluated, the result of evaluating the right-operand is returned.

sequence

left-operand , right-operand

A sequence may be evaluated differently when used as an operand, but is otherwise evaluated thus:

left-operand is evaluated, then right-operand is evaluated, and a new sequence of the results is created. Individual operators [e.g. conditional, iteration or, selection] may handle sequences differently in certain instances.

range

operand .. operand

declaration

either:

referent : operand

or:

referent :: operand

or:

referent :^ reference

or:

referent ( parameter? (, parameter)* ) : operand

or:

[precedence-operator-string]? operator-string ( parameter? (, parameter)* ) : operand

or:

operator-string : operator-string

global declaration

Normally, non-operator declarations are made in the static environment (source-file, function, ...); if the global environment operator [:] is applied to the declaration then it is made in the global environment. Operator declarations are always made in the global environment.

static declaration

Declarations within a non static-scope (e.g. within a function), can be made static by applying the static environment operator (:). They will be visible to all functions that share the same static scope; e.g. within the same source file.

assignment

either:

reference = operand

or:

reference =^ reference

conditional

either:

condition ? true-operand

condition is evaluated, and if the result, when cast to a boolean value, evaluates to true, then true-operand is evaluated.

or:

condition ? (true-operand ; false-operand)

condition is evaluated, and if the result, when cast to a boolean value, evaluates to true, then true-operand is evaluated, otherwise false-operand id evaluated.

sequences in condition are evaluated as a simple list of expressions, each evaluated in turn; with the result of the evaluation of the final expression in the sequence is used to determine the condition.

operand ? Zen

Zen ? operand

evaluates operand and returns its boolean value.

The ! operator is as above, except the condition is inverted.

selection

either:

condition ?: ( (case-expression : action-expression ;)+ default-action-expression?)

or:

Zen ?: ( (case-expression : action-expression ;)+ default-action-expression?)

sequences in condition are evaluated as a simple list of expressions, each evaluated in turn; with the result of the evaluation of the final expression in the sequence is used to determine the condition.

iteration

either:

iteration-control ?* iteration-operand

or:

iteration-control ?* ( _iteration-expression ; no-iteration-expression )

no-iteration-operand is evaluated if the controlling condition never evaluates true

or:

iteration-control ?* Zen

where iteration-control is either:

condition

or:

( initialization ; condition )

or:

( initialization ; condition ; recalculation )

or:

( identifier : range [ && condition] )

or:

( identifier = range [ && condition] )

or:

( identifier : sequence [ && condition] )

or:

( identifier = sequence [ && condition] )

or:

( identifier : array[range] [ && condition] )

or:

( identifier = array[range] [ && condition] )

or:

( identifier : [initializer] [ && condition] )

or:

( identifier = [initializer] [ && condition] )

The !* operator is as above, except the condition is inverted; does not apply to ranges.

sequences in initialization, condition and, recalculation are evaluated as a simple list of expressions, each evaluated in turn; in the case of condition with the result of the evaluation of the final expression in the sequence is used to determine the condition.

evaluation

left-operand operator right-operand

The following operators are built-in:

&& logical AND

|| logical OR

< less than

<= less than or equal

== equal

<> not equal

>= greater than or equal

> greater than

& bitwise AND

| bitwise OR

~ bitwise XOR

+ add

- subtract

* multiply

/ divide

// modulo

<< shift left

>> shift right

<<< extract left

>>> extract right

<<> rotate left

<>> rotate right

The non-comparative operators also have a self-assigning form: e.g.

reference += operand

applicate

either:

left-operand right-operand

or:

operator right-operand

sigil

operand @ identifier

Used to access the attributes and functions of operand (e.g. type query, type conversion).

When Zen is the operand, it provides access to the system library.