Skip to content

PerceptronV/trilobyte

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Trilobyte

Powerful text pattern parsing made intuitive

pip install trilobyte

Key features:

  • Variable assignment
  • Smart algorithm
  • Powerful expressions

Using an algorithm based on text keypoints, Trilobyte's implementation differs vastly from most other text searching engines available, such as those for Regex. A somewhat high-level overview of the algorithm is presented near the top of ./trilobyte/keypoints/classes.py. This is copied below:

# @dev
# If inputs have so far matched keypoint but keypoint is not yet completed,
#     `matched` = False, `completed` = False
# If previous inputs have matched keypoint, keypoint is complete, and next input does not match,
#     `matched` = True, `completed` = True
# If inputs have so far matched keypoint and keypoint is completed and cannot go on,
#     `matched` = True, `completed` = True
# If inputs have so far matched keypoint and keypoint is completed but can still go on,
#     `matched` = True, `completed` = False
# If previous inputs have matched keypoint, keypoint is yet to complete, and current input does not match,
#     `matched` = False, `completed` = True

# @dev
# @algorithm
# If !`matched` and !`completed` continue on current search branch with current keypoint
# If  `matched` and !`completed` continue on current search branch with current keypoint,
#                                while also forking a new search branch with next keypoint
# If !`matched` and `completed`  delete current search branch
# If  `matched` and `completed`  continue on current search branch with next keypoint

# @dev
# @algorithm
# Open new search branch with root keypoint at every new character in sequence

# @dev
# @algorithm
# When all branches have been computed, resolve conflicting (overlapping) branches,
# giving priority to branches discovered first (unless user specifies otherwise).
# Then, if user does not want recursive search, remove branches nested inside bigger branches.

# @update
# Kill branch early if already overlapped

Docs

Trilobyte is still under development; the following commands have mostly been implemented programmatically, but cannot be parsed from plain text yet.

\                               / Makes the trilo treat the following expression as normal text
~ ( num ) [ pat ]               / Take the negative of pat, optionally supplying max checking length as a
                                  number `num`
* [ text ]                        Ignore case

{ char1 - char2 }               / Command that detects any character between char1 and char2 on UNICODE
                                  (inclusive)
{ pat1 , pat2 , pat3 , ... }    / Command that detects any trilo between the list of alternatives (use `\,`
                                  to avoid compiler treating `,` as a delimiter)

@r ( $var / num ) [ pat ]       / Command that detects repeated patterns of pat (r for repeat),
                                  optionally specify a variable name `var` for the number of repetitions matched,
                                  or supplying a number `num` which fixes the number of repetitions
@d [ delim_pat ] [ pat ]        / Command that detects a list of pat delimited by delim_pat (d for delimited)
@a [ main_pat ] [ rep_pat ]     / Command that detects main_pat, followed by an optional repeated occurrence of
                                  rep_pat after (a for after)
@b [ rep_pat ] [ main_pat ]     / Command that detects main_pat, preceded by an optional repeated occurrence of
                                  rep_pat before (b for before)

%s                              / The space character
%t                              / The tab character
%n                              / The newline character
%r                              / The return character

%w                              / Any whitespace character, including newline
%m                              / Only spaces or tabs (m = s + t, also think of it mono - all on 1 line!)
%f                              / Only newline or return (f = n + r, also think of it as flush)
%U                              / Any uppercase character
%l                              / Any lowercase character
%a                              / Any alphabetical character
%d                              / Any numerical digit
%b                              / Any alphanumeric character (b for basic)

%v                              / Shorthand for any sequence of alphanumeric characters that doesn't
                                  start with a number (v for variable, as these are generally named in
                                  this convention)
%o                              / Shorthand for repetition of %w (o = r + w, also think of it as omni -
                                  everything!)
%e                              / Shorthand for repetition of %m (e = r + m, also think of it as exclusive)
%x                              / Shorthand for repetition of %f (x = r + f, also think of X as separation -
                                  that's what a sequence of %f is anyway)
%p                              / Shorthand for any comma-delimited sequence of %v (v for parameters, as
                                  these are generally formatted in this convention)

$var [ pat ]                    / Represents the expression that follows pat as a variable (use '' for pat
                                  in order to detect any possible expression)

About

Powerful text pattern parsing made intuitive

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages