Skip to content

Flexible Haskell library for parsing and printing S-Expressions

License

Notifications You must be signed in to change notification settings

helvm/sexpresso

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License OBSD Hackage

S-expresso

S-expresso is a Haskell library designed to help you parse and print data or source code encoded as an S-expression. It provides a very flexible parser and (for now) a flat printer.

What is an S-expression

Basically, an S-expression is a special form of tree structured data. An S-expression object is either an atom or a list of atoms and other S-expressions.

This datatype is the definition of an S-expression for S-expresso.

data SExpr b a = SList b [SExpr b a]
               | SAtom a

The parameter a allows you to specify the datatype of atoms and the parameter b is usefull for keeping metadata about S-expression like source position for example.

SExpr is not equivalent to [a] because the later cannot distinguish between an atom (SAtom _) and a tree containing only one atom (SList _ [SAtom _]). SExpr is also not equivalent to Tree a from Data.Tree because the later cannot encode the empty tree (SList _ []) and does not enforce that atoms are at the leaves.

The Sexp type

If you are only interested by the atoms, you can use the type alias Sexp that is a variant of the more general 'SExpr' data type with no data for the 'SList' constructor.

type Sexp a = SExpr () a

This type also comes with a bidirectional pattern synonym also named Sexp for object of the form SExpr () _.

x = Sexp [A 3]                   <-> x = SList () [SAtom 3]
foo (Sexp xs)                    <-> foo (SList () xs)
foo (Sexp (Sexp ys : A x : xs))  <-> foo (SList () (SList () ys : SAtom x : xs))

Pattern synonyms

S-expresso defines four pattern synonyms to ease your programming with SExpr. The patterns L helps you match the SList constructor and only its sublist, disregarding the b field. The pattern ::: and Nil helps you specify the shape of the sublist of an SList constructor and finally the pattern A is a shorthand for SAtom.

Together they make working with SExpr a little easier.

a = A 3                      <-> a = SAtom 3
foo (A x)                    <-> foo (SAtom x)
foo (A x1 ::: A x2 ::: Nil)  <-> foo (SList _ [SAtom x1, SAtom x2])
foo (A x ::: L xs))          <-> foo (SList _ (SAtom x : xs))
foo (L ys ::: A x ::: L xs)) <-> foo (SList _ (SList _ ys : SAtom x : xs))
foo (L x)                    <-> foo (SList _ x)

Notice that you need to end the pattern ::: with Nil for the empty list or L xs for matching the remainder of the list. Indeed, if you write

foo (x ::: xs) = ...

this is equivalent to :

foo (SList b (x : rest)) = let xs = SList b rest
                           in ...

You can refer to the documentation of the ::: constructor for more information.

Parsing S-expressions

The parsing is based on megaparsec. S-expresso allows you to customize the following :

  • The parser for atoms
  • The opening tag (usually "("), the closing tag (usually ")") and a possible dependency of the closing tag on the opening one.
  • If some space is required or optional between any pair of atoms.
  • How to parse space (ex: treat comments as whitespace)

The library offers amoung others the decodeOne and decode functions. The former only reads one S-expression while the other parses many S-expressions. Both functions creates a megaparsec parser from a SExprParser argument.

The SExprParser is the data type that defines how to read an S-expression. The easiest way to create a SExprParser is to use the function plainSExprParser with your own custom atom parser. This will create a parser where S-expression starts with "(", ends with ")" and space is mandatory between atoms.

Import Data.Void
Import qualified Data.Text as T
Import Text.Megaparsec
Import Text.Megaparsec.Char
Import qualified Text.Megaparser.Char.Lexer as L

atom = some letter

sexp = decode $ plainSExprParser atom

-- Returns (SList () [SAtom "hello", SAtom "world"])
ex1 = parse sexp "" "(hello world)"

-- Returns (SList () [SAtom "hello", SAtom "world", SList () [SAtom "bonjour"]])
ex2 = parse sexp "" "  (hello world(bonjour))  "

-- Returns SAtom "hola"
ex2 = parse sexp "" "hola"

Customizing the SExprParser

S-expresso provides many functions to modify the behavior of the parser. For example, you can use the functions setTags, setTagsFromList, setSpace and setSpacingRule to modify the behavior of the parser. Following on the preceding example:

-- setTags
data MyType = List | Vector

listOrVector =
  let sTag = (char '(' >> return List) <|> (string "#(" >> return Vector)
      eTag = \t -> char ')' >> return t
      p = setTags sTag eTag $
          plainSExprParser atom
  in decode p

-- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"])
ex3 = parse listOrVector "" "(#(a b) c)"

-- setTagsFromList
listOrVector2 = decode $ 
                setTagsFromList [("(",")",List),("#(",")",Vector)] $
                plainSExprParser atom


-- Returns (SList List [SList Vector [SAtom "a", SAtom "b"], SAtom "c"])
ex4 = parse listOrVector2 "" "(#(a b) c)"

-- setSpace
withComments = decode $
               -- See megaparsec Space in Megaparsec.Char.Lexer
               setSpace (L.Space Space1 (skipLineComment ";") empty) $
               plainSExprParser atom

-- Returns (SList () [SAtom "hello", SList () [SAtom "bonjour"]])
ex5 = parse withComments "" "(hello ;world\n (bonjour))"

-- setSpacingRule
optionalSpace = decode $
                setSpacingRule spaceIsOptional $
                plainSExprParser (some letter <|> some digitChar)

-- Returns (SList () [SAtom "hello", SAtom "1234", SAtom "world"])
ex5 = parse optionalSpace "" "(hello1234world)"

You can also directly build a custom SExprParser with the constructor SExprParser.

Adding Source Location

If you need the source position of the atoms and s-expression, the function withLocation transforms an SExprParser b a into SExprParser (Located b) (Located a). The Located datatype is defined here.

About

Flexible Haskell library for parsing and printing S-Expressions

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

No packages published

Languages

  • Haskell 100.0%