Collection Transformation Language (CTL)

This document captures the definition, syntax, architecture, and future extensions for CTL—a minimal DSL for transforming collections of numbers through a pipeline of steps.

Overview

CTL (Collection Transformation Language) is a small, lowercase-only domain-specific language designed to apply a sequence of transformations to collections (vectors) of numbers. It emphasizes an implicit chaining model: each operation applies to the result of the previous statement, culminating in an output step that prints the final vector.

Primary Concepts:

Program: a series of statements, executed top-to-bottom
Statement: name : expr binds the result of expr to name
Vector Literal: [n1, n2, ...]
Map: { x : <arith-expr> }
Filter: { x : <boolean-expr> }
Expand: { [binding-list] | binding }
Output: output: prints the last bound vector

Syntax Concepts

Concept	Surface Syntax	Semantics
Program	sequence of `Stmt` lines	executed in order, binding vectors to names
Statement	`name : expr`	evaluate `expr`, store under `name`
Output Statement	`output:`	print the current vector
Vector Literal	`[1, 2, 3]`	fixed array of numbers
Map	`{ x : x * 2 }`	for each element of the last vector, bind to `x`, eval body
Filter	`{ x : x % 3 != 0 }`	keep only elements where body is true
Expand	`{ [x, x+1] \| x }`	for each `x`, produce a sub-vector `[x, x+1]`

Grammar (BNF)

Program       ::= StmtList EOF

StmtList      ::= Stmt ('\n')*

Stmt          ::= IDENT ':' Expr
                | 'output:'

Expr          ::= VectorLiteral
                | MapExpr
                | FilterExpr
                | ExpandExpr

MapExpr       ::= '{' IDENT ':' ArithExpr '}'

FilterExpr    ::= '{' IDENT ':' BoolExpr '}'

ExpandExpr    ::= '{' '[' ArithExprList ']' '|' IDENT '}'

BoolExpr      ::= ArithExpr (('==' | '!=' | '<' | '>' | '<=' | '>=') ArithExpr)?

VectorLiteral ::= '[' NumberList ']'

NumberList    ::= NUMBER (',' NUMBER)*

ArithExprList ::= ArithExpr (',' ArithExpr)*

BindingList   ::= IDENT (',' IDENT)*

ArithExpr     ::= ArithExpr ('+' | '-' | '*' | '/' | '%') ArithExpr
                | NUMBER
                | IDENT
                | '(' ArithExpr ')'

(section needs revision) MapExpr, FilterExpr vs ExpandExpr

map and filter each take exactly one input element and produce exactly one output element (a number or a boolean). So their syntax is a simple binding and body:

{ x : <expr(x)> }

The colon (:) says “for each input bound to x, compute this one result.”

expand on the other hand, can produce multiple outputs per input (zero, one, or many). You need to say both “here’s how I bind the input” and “here’s the list of outputs to splice back into the pipeline.” The pipe (|) visually separates those two concerns:

{ [out1, out2, ...] | x }

meaning “for each input bound to x, emit this entire sub‑vector [out1, out2, ...].”

Interpreter Architecture

CTL’s interpreter follows a classic three-stage pipeline:

Lexer: reads input characters, skips whitespace/comments, and emits a stream of Tokens:
- T_IDENT, T_NUMBER, punctuation ([, ], {, }, :, ,, |), operators (+, -, *, /, %, ==, !=, <, >), and T_EOF.
Parser: a recursive-descent parser consumes Tokens and constructs an Abstract Syntax Tree (AST) according to the BNF above. Key entrypoints:
- parse_program(), parse_stmt(), parse_expr(), parse_arith()
Evaluator: walks the AST, maintaining an environment that maps name -> Vec*. For each Stmt:
- VectorLiteral: allocates a Vec, populates items.
- Map/Filter: iterates over previous Vec, binds each element, evaluates body expression, and collects results.
- Expand: similar, but body yields a sub-Vec for each element, which are concatenated.
- Output: retrieves the last Vec and prints its contents.

Error conditions (syntax/runtime) immediately abort with descriptive messages.

Key Data Structures (C)

// Token representation
typedef enum {
    T_IDENT, T_NUMBER,
    T_LBRACK, T_RBRACK, T_LBRACE, T_RBRACE,
    T_COLON, T_COMMA, T_PIPE,
    T_PLUS, T_MINUS, T_STAR, T_SLASH, T_PERCENT,
    T_EQ, T_NEQ, T_LT, T_GT,
    T_EOF,
} TokenType;

typedef struct {
    TokenType type;
    char *lexeme;   // for IDENT and NUMBER
} Token;

// AST node kinds
typedef enum {
    AST_PROGRAM,
    AST_STMT,
    AST_VECTOR_LITERAL,
    AST_MAP_EXPR,     // also used for filter
    AST_EXPAND_EXPR,
    AST_BINOP,
    AST_VAR,
    AST_NUM,
} AstKind;

typedef struct AstNode AstNode;
typedef struct Vec Vec;

struct AstNode {
    AstKind kind;
    union {
        // AST_STMT
        struct { char *name; AstNode *expr; } stmt;
        // AST_VECTOR_LITERAL
        Vec *vector;
        // AST_MAP_EXPR / AST_EXPAND_EXPR
        struct {
            char **bindings;   // array of binding names
            int n_bindings;
            AstNode *body;     // arithmetic or sub-vector expr
        } transform;
        // AST_BINOP
        struct {
            TokenType op;
            AstNode *left, *right;
        } binop;
        // AST_VAR
        char *varname;
        // AST_NUM
        double num;
    } as;
};

// Dynamic array of doubles
struct Vec {
    double *items;
    size_t len;
    size_t cap;
};

Example Program

input: [1,2,3,4,5]

map:    { x : x * 2 }
filter: { x : x % 3 != 0 }
expand: { [x, x+1] | x }

output:

This produces: [2,3,4,5,8,9,10,11].

Future Extensions (v2)

In later versions of CTL, we may add these operations:

1. Reduce

Syntax:

reduce: { acc, x : <acc_expr> }

acc: running accumulator, x: current element.
Folds vector to a single value (left-to-right).

Example:

input: [1,2,3,4]
reduce: { acc, x : acc + x }
output:

Yields 10.

2. Zip

Syntax:

zip: { a, b : <expr(a,b)> }

a, b bind to elements of the two most recent vectors.
Stops at the shorter vector’s length.

Example:

input1: [1,2,3]
input2: [4,5,6]
zip: { a, b : a * b }
output:

Yields [4,10,18].

3. Matrix Operations

a) Transpose

Syntax:

transpose:

Swaps rows and columns of the current matrix.

b) Dot Product

Syntax:

dot: { A, B }

For 1-D vectors: inner product.
For 2-D matrices: matrix multiplication.

Example:

A: [[1,2],[3,4]]
B: [[5,6],[7,8]]
dot: { A, B }
output:

Yields [[19,22],[43,50]].

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
tests/lexer		tests/lexer
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Collection Transformation Language (CTL)

Overview

Syntax Concepts

Grammar (BNF)

(section needs revision) MapExpr, FilterExpr vs ExpandExpr

Interpreter Architecture

Key Data Structures (C)

Example Program

Future Extensions (v2)

1. Reduce

2. Zip

3. Matrix Operations

a) Transpose

b) Dot Product

About

Uh oh!

Releases

Packages

Languages

arj1211/ctl

Folders and files

Latest commit

History

Repository files navigation

Collection Transformation Language (CTL)

Overview

Syntax Concepts

Grammar (BNF)

(section needs revision) MapExpr, FilterExpr vs ExpandExpr

Interpreter Architecture

Key Data Structures (C)

Example Program

Future Extensions (v2)

1. Reduce

2. Zip

3. Matrix Operations

a) Transpose

b) Dot Product

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages