A versatile command line tool to deal with streams, with a (mostly point-free) functional approach.
State: working and usable but I want to 'compile', that is, down to a linear sequence of instructions. Current way of interpreting is very wasteful.
Also see the 'wip and such' below.
Similarly to commands such as awk
or sed
, sel
takes
a script to apply to its standard input to produce its
standard output.
In its most basic form, the script given to sel
is
a series of functions separated by ,
(comma). See the
complete syntax bellow. In this way, each function
transforms its input and passes its output to the next one
(-
is the function that returns the input stream):
$ printf 12-42-27 | sel -, split :-:, map [add 1], join :-:
13-43-28
$ printf abc | sel -, codepoints # same as 'sel codepoint -'
97
98
99
When the first argument names a file starting with #!
the file is read and parsed first. Any additional arguments
are also parsed in continuation of the script.
$ cat pred.sel
#!/usr/bin/env sel
sub 1
$ sel pred.sel 5
4
TLDR:
- lists:
{0b1, 0o2, 0x3, 4.2}
, strings::hi how you:
my-func first-arg [add 1 2] third-arg
f, g
isg(f(..))
(orpipe f g
or[flip compose] g f
)
Special characters and keywords:
,
:
=
[
]
def
let
use
{
}
3 special forms:
def name :description: value
will define a new name that essentially replaces by value where it is used;use :some/file: f
will 'import' all the defined names from the file asf-<name>
(or as is if using_
);let pattern result fallback
will make a function of one argument that computes result if pattern matches, pattern can introduces names (eglet {a, b,, rest} [add a b] 0
, the,, rest
matches the rest of the list), if pattern is irrefutable then there is no fallback.
Here is the complete syntax:
top ::= {'use' <bytes> <word> ','} {'def' <word> <bytes> <value> ','} [<script>]
script ::= <apply> {',' <apply>}
apply ::= <binding> | <value> {<value>}
value ::= <atom> | <subscr> | <list> | <pair>
binding ::= 'let' (<irrefut> <value> | <pattern> <value> <value>)
irrefut ::= <word> | <irrefut> '=' <irrefut>
pattern ::= <atom> | <patlist> | <patpair>
patlist ::= '{' [<pattern> {',' <pattern>} [',' [',' <word>]]] '}'
patpair ::= (<atom> | <patlist>) '=' <pattern>
atom ::= <word> | <bytes> | <number>
subscr ::= '[' <script> ']'
list ::= '{' [<apply> {',' <apply>} [',']] '}'
pair ::= (<atom> | <subscr> | <list>) '=' <value>
word ::= /[-a-z]+/ | '_'
bytes ::= /:([^:]|::)*:/
number ::= /0b[01]+/ | /0o[0-7]+/ | /0x[0-9A-Fa-f]+/ | /[0-9]+(\.[0-9]+)?/
comment ::= '#' /.*/ '\n'
The objective here was to make it possible to type the script plainly in any (most?) shell without worrying about quoting much if at all:
- the script can span multiple arguments, they are joined naturally with a single space
- the single and double quotes are not used, so to feel safer the whole script can be quoted
One case which can cause problem is lists ({ .. }
) which
can be interpreted as glob if not containing a space.
For that reason, it is highly recommended to keep the space
after the ,
separating items.
Type notations are inspired by Haskell:
- number and bytestring:
Num
andStr
; - list:
[a]
; - function:
a -> b
, whenb
is itself a function it will bea -> x -> y
, but whena
is a function then it is(x -> y) -> b
; - pair:
(a, b)
.
Lists and bytestring can take a +
suffix (eg. Str+
and [Num]+
) which represent a potentially unbounded
object (simplest example is repeat 1 :: [Num]+
, an
infinite list of 1s).
The item type of a literal list is inferred as the list is parsed:
{1, 2, 3, :soleil:}
not ok because inferred as[Num]
{repeat 1, {1}} :: [[Num]+]
ok because{1}
can 'lose' its bounded charateristic safely{{1}, repeat 1}
not ok because inferred as[[Num]]
at the first item andrepeat 1
can never 'lose' its unbounded charateristic safely
The CLI -t
option will give the type of the expression.
When a direct function argument doesn't match the parameter, one of these function is automatically inserted:
wanted | true type | inserted |
---|---|---|
Num |
Str+ |
, tonum, |
Str |
Num |
, tostr, |
[Num]+ |
Str+ |
, codepoints, |
[Str]+ |
Str+ |
, graphemes, |
Str+ |
[Str+]+ |
, ungraphemes, |
Str+ |
[Num]+ |
, uncodepoints, |
There is also a for now temporary behavior on the output depending on the type:
Num
: printed with a newline(a, b)
: printed with a tabulation between the two and a newlineStr
: printed as is[a]
: printed with a newline after each entries
The existing functions can be queried with -l
:
$ sel -l
[... list of every functions ...]
$ sel -l map add
map :: (a -> b) -> [a]+ -> [b]+
make a new list by applying an unary operation to each value from a list
add :: Num -> Num -> Num
add two numbers
$ sel -l :: 'a -> Num'
[... list of matching functions ...]
There is also an undocumented word that completely aborts the parsing: fatal
.
Python, Haskell, Rust, jq, tree-sitter, dt, Helix
- try to free indices that are not used
- polish for cases such as 2
a
s being distinct - ex of inf type
(a -> a) -> a <- (b -> Num) -> b
- something about pseudo syntaxes in named type ('paramof', 'returnof', 'a=b', also '?' and '?abc')
something like $PYTHONSTARTUP
, between prelude and user script
process description of def
s (eg. markdown-ish?)
maybe name for var types in there
{1, 2, 3}, map ln
could tostr in mapsplit :-:, map [add1]
could tonum in mapadd 1, tonum
could tostr in between
-
constant folding; because pure, identify what is not compile-time known:
- can fold: literal (numbers, bytestrings), a list if all items can be folded, a call if all arguments are provided and can be folded
- cannot fold:
input
, infinite sources cannot be turned into a finite structure but can still be expressed statically, 'control-point' functions and/or functions with side effects if ever
-
thunks? but I'm wondering if there is a way to even more directly put the instruction at the location at c-time rather than packing them at r-time
-
lifetime tracking, or maybe 'duplication tracking'
the GitHub "Need inspiration?" bit was "super-spoon"