Syrup is a lightweight and easy-to-develop (and reasonably easy to read) serialization of Preserves. You can also think of it as an extension of the ideas of canonical s-expressions and Bencode.
For the sake of simplicity, let’s look at examples in Python.
We’ve got bytestrings and strings and integers and booleans:
>>> from syrup import syrup_encode, syrup_decode, Symbol
# bytestrings
>>> syrup_encode(b"a bytestring")
b'12:a bytestring'
# strings
>>> syrup_encode("a string")
b'8"a string'
# symbols (maybe only lispy people care)
>>> syrup_encode(Symbol('foo'))
b"3'foo"
# integers
>>> syrup_encode(42)
b'42+'
>>> syrup_encode(0)
b'0+'
>>> syrup_encode(-123)
b'123-'
# floats (single and double precision, but Python only supports double)
# (encoding looks ugly... floats are complicated, IEEE 754 bla bla)
>>> syrup_encode(123.456)
b'D@^\xdd/\x1a\x9f\xbew'
# booleans
>>> syrup_encode(True)
b't'
>>> syrup_encode(False)
b'f'
But maybe you want to combine things. So ok, we have lists, sets, and dictionaries:
# lists
>>> syrup_encode(["foo", 123, True])
b'[3"foo123+t]'
# dictionaries
>>> syrup_encode({"species": "cat",
... "name": "Tabatha",
... "age": 12})
b'{3"age12+4"name7"Tabatha7"species3"cat}'
# sets
>>> syrup_encode({"cookie", "milk", "napkin"})
b'#4"milk6"cookie6"napkin$'
When reading, whitespace is ignored (but we recommend using the official Preserves syntax if you’re trying to make things human-readable).
>>> syrup_decode(b'{3"age 12+ 4"name 7"Tabatha 7"species 3"cat}')
{'age': 12, 'name': 'Tabatha', 'species': 'cat'}
Arbitrary nesting is allowed, including in keys and sets, if the platform supports it.
Sets and dictionaries are both ordered, dictionaries by keys. Items and keys are sorted by comparing them in sorted form. (This might make Tony Garnock-Jones say, “yuck!”) But in this way, if two applications both agree to use Syrup, it is an acceptable form for canonicalization.
Maybe this seems almost good enough, but you need your own types. For this purpose, Syrup also provides a Record type:
>>> from syrup import record
# We could encode a date as a single iso8601 string
>>> syrup_encode(record('isodate', '2020-05-01T14:08:11'))
b'<4"date19"2020-05-01T14:08:11>'
# But records permit multiple arguments,
# so we could encode it that way too
>>> syrup_encode(record('date', 2020, 5, 1, 14, 8, 11))
b'<4"date2020+5+1+14+8+11+>'
Here’s nearly everything you need to know, taken right from a comment in the Racket implementation:
;; Booleans: t or f
;; Single flonum: F<ieee-single-float> (big endian)
;; Double flonum: D<ieee-double-float> (big endian)
;; Positive integers: <int>+
;; Negative integers: <int>-
;; Bytestrings: 3:cat
;; Strings: 3"cat (utf-8 encoded)
;; Symbols: 3'cat (utf-8 encoded)
;; Dictionary: {<key1><val1><key2><val2>}
;; Lists: [<item1><item2><item3>]
;; Records: <<label><val1><val2><val3>> (the outer <> for realsies tho)
;; Sets: #<item1><item2><item3>$
(Sorry, records look a bit confusing there, since <>
are actually
used rather than just a placeholder for a variable.)
There’s only one other key detail. Writing out a Syrup structure should always canonicalize it. The good news: this is fairly easy to do via recursion, since only dictionaries and sets are unordered. Dictionaries are ordered by their keys, sets by their items (and dictionaries must not include the same key twice). Simply write out the keys/items first, then sort them by the bytes, from lower to higher.
That’s it, really. Easy-peasy.
In comparison to canonical s-expressions, syrup uses similar syntax
when limited to lists and bytestrings, except that in writing uses
[]
rather than ()
. But as you can see above, it also defines a
lot of other types, too.
Syrup is also similar to Bencode, in that it supports integers, bytestrings, lists, and dictionaries. However its syntax for lists and dictionaries are different when syrup is written.
For funsies, the syrup decoders that ship in this repository will accept Bencode or canonical s-expression syntax (except for canonical s-expression “display hints” syntax, but those never made sense anyway… use records instead).
>>> syrup_decode(b'd3:agei12e4:name5:Missy7:species3:cate')
{b'age': 12, b'name': b'Missy', b'species': b'cat'}
But, Syrup uses {}
instead of de
for dictionaries when encoding
itself.
>>> syrup_encode({b'age': 12, b'name': b'Missy', b'species': b'cat'})
b'{3:agei12e4:name5:Missy7:species3:cat}'
Implementations in impls/ subdirectory:
External implementations:
Apache v2