GramExp

GramExp is a string matching and capturing library with the aim of being more expressive and maintianable than stanard regular expression libraries. GramExp's is based on Parser Expression Grammars described here : http://www.brynosaurus.com/pub/lang/gramExp.pdf

"PEGs are stylistically similar to CFGs with RE-like features added, much like Extended Backus-Naur Form (EBNF) notation"

GramExp is implemented in Parboiled (https://github.com/sirthias/parboiled).

GramExp supports sub-match capturing and can utilise the value stack to balance embedded values.

// classic an bn cn language problem
String grammar =  "D <- &(A !'b') 'a'* B !." +
                        "A <- 'a' A 'b' / :\n" +
                        "B <- 'b' B 'c' / :\n";
try (
  GramExp gramExp = new GramExp(grammar);
) {

  for(String input : new String[]{"abc", "aabbcc", "abbc"}) {
    
    boolean match = gramExp.match(input);
    
    System.out.println(input + " : " + (match?"match":"no match"));
  }
}

//basic html parser with named capture example and balanced tags (via push and pop)

String grammar2 =
        "/nlp/\n" +
        "document <- (tag / text)* $\n" +
        "tag <- open_tag (text / tag)* close_tag / self_close \n" +
        "open_tag <- '<' tag_type push (S attr)? '>'\n" +
        "close_tag <- '</' tag_type pop '>'\n" +
        "self_close <- '<' tag_type '/'? '>'\n" +
        "attr <- <[0-9a-zA-Z =\"'#\\[-]+ 'attr'>\n" +
        "tag_type <- <[0-9a-zA-Z]+ 'tag'>\n" +
        "text <- <(!'<'.)+ 'content'>";

try (
        GramExp gramExp = new GramExp(grammar2);
) {

        System.out.println(gramExp.groups());
        for(String input : new String[]{"<html><body>content<br>new line<br/>another line<br>badgers</body></html>"}) {
        
            System.out.println(gramExp.find(input));
            //[tag=html, tag=body, content=content, tag=br, content=new line, tag=br, content=another line, tag=br, content=badgers, tag=body, tag=html]
        }
}

The full syntax and grammar dexscribred in PEG is :

# Hierarchical syntax
Grammar <- Spacing Mode? Definition+ EndOfFile
Mode <- '/' (!'/' Char)+ '/' Spacing
Definition <- Identifier LEFTARROW Expression
Expression <- Sequence (SLASH Sequence)*
Sequence <- Prefix*
Prefix <- (AND / NOT)? Suffix / AOPEN Suffix Literal ACLOSE
Suffix <- Primary (QUESTION / STAR / PLUS)? (PUSH / POP)?
Primary <- Identifier !LEFTARROW
/ OPEN Expression CLOSE
/ Literal / Class / DOT / EMPTY / NOTHING
# Lexical syntax
Identifier <- IdentStart IdentCont* Arguments? Spacing
IdentStart <- [a-zA-Z_]
IdentCont <- IdentStart / [0-9]
Arguments <- AOPEN (Identifier / Literal)+ ACLOSE
Literal <- ['] (!['] Char)* ['] Spacing
/ ["] (!["] Char)* ["] Spacing
Class <- '[' (!']' Range)* ']' Spacing
Range <- Char '-' !']' Char / Char
Char <- '\\' [nrt'"\[\]\\]
/ '\\' [0-2][0-7][0-7]
/ '\\' [0-7][0-7]?
/ !'\\' .
LEFTARROW <- '<-' Spacing
SLASH <- '/' Spacing
AND <- '&' Spacing
NOT <- '!' Spacing
QUESTION <- '?' Spacing
STAR <- '*' Spacing
PLUS <- '+' Spacing
OPEN <- '(' Spacing
CLOSE <- ')' Spacing
DOT <- '.' Spacing
EMPTY <- ':' Spacing
NOTHING <- '~' Spacing
COPEN <- '{' Spacing
CCLOSE <- '}' Spacing
AOPEN <- '<' Spacing
ACLOSE <- '>' Spacing
PUSH <- 'push' Spacing
POP <- 'pop' Spacing
Spacing <- (Space / Comment)*
Comment <- '#' (!EndOfLine .)* EndOfLine
Space <- ' ' / '\t' / EndOfLine
EndOfLine <- '\r\n' / '\n' / '\r'
EndOfFile <- !.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GramExp

About

Releases

Packages

Contributors 2

Languages

License

CASM-Consulting/GramExp

Folders and files

Latest commit

History

Repository files navigation

GramExp

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages