-
Notifications
You must be signed in to change notification settings - Fork 265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LPeg based syntax highlighting & Lua integration #81
Comments
I don't have any major personal qualms about using lpeg based syntax highlighting... though I'd like to add to the consideration to consider basing a lexer off re2c. But effort-wise and possibly immediate-quality-wise, using lpeg would seem best for now. |
I'd be happy to (attempt to) port the APL lexer to lua, once I get the basic lua functionality working. I'm running lua 5.3.0, which seems to not have the lua_open() macro. My best guess was the following:
This patch gives me a clean compile. I can run this vis, but haven't been able to get the syntax highlighting to work. I've tried, for example :set syntax c and :set syntax ansi_c; neither is recognized as valid. |
Thanks for trying it out and sorry for the sparse documentation/error handling etc. I assume you are using your system lua libraries? It probably is a good idea to check whether it works from the lua console (where you will get detailed error messages):
On Fedora this will need the lua-lpeg package. Once this works it should in theory also work within vis. Tell vis where to look for the I tested it by means of the The default is 5.1.x because as far as I understand this is the last version fully supported by luajit. |
Running |
Thanks for noticing this, should hopefully be fixed with the latest commit 050111f. It worked here since I have symlinked Looking forward to your improved color themes. |
Ok, one step further. :D Now I get: Looks like this is a lua 5.3 issue... |
I guess my testing environment was somewhat screwed up, but it definitely works with lua 5.1.x Question is why does it work at all in Lua 5.1.x? Time to learn about the changes between different Lua versions. |
Yes, looks like removing the However syntax highlighting looks completely different to what legacy |
The default theme is based on the solarized color scheme (some mappings are most likely screwed up, the contrast between them seems too low?). The color related code in For comparison here it looks like this: This can all be modified by tweaking the files in |
With the lua-lpeg package installed, VIS_PATH defined and the above-noted edit to lexers/lexer.lua:856, I now see syntax highlighting. This is with lua 5.3 on Fedora 22. I'm seeing the same low-contrast theme that Marc posted. I looked at solarized a long time ago, so perhaps memory fails me, but I thought solarized is a low-contrast theme. |
|
I'm open to suggestions, preferably in form of patches. PS: the lua branch has been rebased on top of master, I will maintain it as a topic branch until we decide to merge it. |
Here's my first cut at an apl parser. diff --git a/lexers/apl.lua b/lexers/apl.lua
new file mode 100644
index 0000000..e7a160b
--- /dev/null
+++ b/lexers/apl.lua
@@ -0,0 +1,55 @@
+-- ? LPeg lexer.
+
+local l = require('lexer')
+local token, word_match = l.token, l.word_match
+local P, R, S = lpeg.P, lpeg.R, lpeg.S
+
+local M = {_NAME = 'apl'}
+
+-- Whitespace.
+local ws = token(l.WHITESPACE, l.space^1)
+
+-- Comments.
+local comment = token(l.COMMENT, '⍝' * l.nonnewline^0)
+
+-- Strings.
+local sq_str = l.delimited_range("'", false, true)
+local dq_str = l.delimited_range('"')
+
+local string = token(l.STRING, sq_str + dq_str)
+
+-- Numbers.
+local number = token(l.NUMBER, l.float + l.integer)
+
+-- Keywords.
+local keyword = token(l.KEYWORD, P('⍞') + (P('⎕') * l.alpha^0))
+
+-- Variables.
+local variable = token(l.VARIABLE, (l.alpha + S('_∆⍙')) * (l.alnum + S('_∆⍙¯')^0))
+
+-- Operators.
+local operator = token(l.OPERATOR, S('{}[]()←→'))
+
+-- Labels.
+local label = token(l.LABEL, l.alnum^1 * P(':'))
+
+-- Nabla.
+local nabla = token('nabla', S('∇⍫'))
+
+M._rules = {
+ {'whitespace', ws},
+ {'comment', comment},
+ {'string', string},
+ {'number', number},
+ {'keyword', keyword},
+ {'label', label},
+ {'variable', variable},
+ {'operator', operator},
+ {'nabla', nabla},
+}
+
+M._tokenstyles = {
+
+}
+
+return M
diff --git a/lexers/lexer.lua b/lexers/lexer.lua
index 371a226..871bd1d 100644
--- a/lexers/lexer.lua
+++ b/lexers/lexer.lua
@@ -1599,6 +1599,7 @@ local files = {
[".adb|.ads"] = "ada",
[".g|.g4"] = "antlr",
[".ans|.inp|.mac"] = "apdl",
+ [".apl"] = "apl",
[".applescript"] = "applescript",
[".asm|.ASM|.s|.S"] = "asm",
[".asa|.asp|.hta"] = "asp", |
Re: low contrast of default (solarized) theme on 256-color terminal: I found it useful to change the base03 (background) color to #000000; this darkens the background and improves overall contrast. |
Re: color theme and terminal capabilities: It seems that the current solarized theme is designed for a terminal that supports TrueColor (i.e. 24-bit color). This works fine with st. However, xterm only supports 256 colors: the current theme doesn't render well on xterm. Worse still is urxvt (the non-256 variant) which supports only 88 colors. It might be useful to try to map the theme onto the colors supported by a 256-color terminal. |
Solved one problem, now we have three :) > found it useful to change the base03 (background) color to #000000; this darkens the background and improves overall contrast. I did this also (#000000 on base03 with solarized) But at the terminal level... It follows that if people are going to the effort to utilize vis, they could affect colors on their own (otherwise vim, nvim might be better suited). On Wed, Oct 28, 2015 at 4:30 PM, David B. Lamkins
|
Thanks, committed! This leaves ledger contributed by @clehner as the only format not supported by the lua based syntax highlighting code. I do not know whether he is still interested in vis?
Well the current code (borrowed from tmux) will always "dump down" the 24bit color to one found in the 256 color palette. According to this site summarizing TrueColor support in various terminals, there exist funamental problems with supporing TrueColor mode in curses. I don't think it is possible to do without bypassing curses (this is also an issue for dvtm). There is a curses API to change individual color contents TLDR: yes I agree the color scheme should be usable in a 256 color terminal. Ideally also with 16 colors, or there should at least exist a low color version which is automatically used if such a setting is detected. |
Here's a start for a ledger lexer. I have not yet tested it because I haven't been able to get syntax highlighting to work on the lua-ledger branch (tried diff --git a/lexers/ledger.lua b/lexers/ledger.lua
new file mode 100644
index 0000000..7b787bc
--- /dev/null
+++ b/lexers/ledger.lua
@@ -0,0 +1,48 @@
+-- ? LPeg lexer.
+
+local l = require('lexer')
+local token, word_match = l.token, l.word_match
+local P, R, S = lpeg.P, lpeg.R, lpeg.S
+
+local M = {_NAME = 'ledger'}
+
+local delim = P('\t') + P(' ')
+
+-- Whitespace.
+local ws = token(l.WHITESPACE, l.space^1)
+
+-- Comments.
+local comment = token(l.COMMENT, S('#;') * l.nonnewline^0)
+
+-- Date.
+local date = token(l.CONSTANT, l.starts_line(l.integer * S(' \t')^1))
+
+-- Account.
+local account = token(l.VARIABLE, l.starts_line(S(' \t')^1 * M.print * -delim))
+
+-- Amount.
+local amount = token(l.NUMBER, delim * (1 - S(';\r\n')^1))
+
+-- Automated transactions.
+local auto_tx = token(l.PREPROCESSOR, l.starts_line(S('=~') * l.nonnewline^0))
+
+-- Directives.
+local directive_word = word_match{
+ 'account', 'alias', 'assert', 'bucket', 'capture', 'check', 'comment',
+ 'commodity', 'define', 'end', 'fixed', 'endfixed', 'include', 'payee',
+ 'apply', 'tag', 'test', 'year'
+} * S('AYNDCIiOobh')
+local directive = token(l.KEYWORD, l.starts_line(S('!@')^-1 * directive_word))
+
+M._rules = {
+ {'whitespace', ws},
+ {'comment', comment},
+ {'date', date},
+ {'account', account},
+ {'amount', amount},
+ {'auto_tx', auto_tx},
+ {'directive', directive},
+}
+
+return M
+
diff --git a/lexers/lexer.lua b/lexers/lexer.lua
index 871bd1d..57228fc 100644
--- a/lexers/lexer.lua
+++ b/lexers/lexer.lua
@@ -1647,6 +1647,7 @@ local files = {
[".bbl|.dtx|.ins|.ltx|.tex|.sty"] = "latex",
[".less"] = "less",
[".lily|.ly"] = "lilypond",
+ [".ledger|.journal"] = "ledger",
[".cl|.el|.lisp|.lsp"] = "lisp",
[".litcoffee"] = "litcoffee",
[".lua"] = "lua", Also, roff/man syntax is in the C-based syntax definitions but not yet in Scintillua. It should be fairly simple to port. |
Thanks! Yes I forgot about the roff/man formats, any volunteers?
What is the problem? Did you get it to compile/link? Was everything working except for the syntax highlighting? Did you set |
@martanne I was running it in-place and was missing VIS_PATH. Got it working now. Now that I test the patch I gave I see it doesn't work. I'll submit a PR once I've fixed it |
Now that I grok the way in which lpeg treats multibyte characters, here's the patch to make the APL parser fully functional: diff --git a/lexers/apl.lua b/lexers/apl.lua
index 6fa9af7..4e718cd 100644
--- a/lexers/apl.lua
+++ b/lexers/apl.lua
@@ -10,7 +10,7 @@ local M = {_NAME = 'apl'}
local ws = token(l.WHITESPACE, l.space^1)
-- Comments.
-local comment = token(l.COMMENT, '⍝' * l.nonnewline^0)
+local comment = token(l.COMMENT, (P('⍝') + P('#')) * l.nonnewline^0)
-- Strings.
local sq_str = l.delimited_range("'", false, true)
@@ -19,22 +19,39 @@ local dq_str = l.delimited_range('"')
local string = token(l.STRING, sq_str + dq_str)
-- Numbers.
-local number = token(l.NUMBER, l.float + l.integer)
+local dig = R('09')
+local rad = P('.')
+local exp = S('eE')
+local img = S('jJ')
+local sgn = P('¯')^-1
+local float = sgn * (dig^0 * rad * dig^1 + dig^1 * rad * dig^0 + dig^1)
+ * (exp * sgn *dig^1)^-1
+local number = token(l.NUMBER, float * img * float + float)
-- Keywords.
-local keyword = token(l.KEYWORD, P('⍞') + (P('⎕') * l.alpha^0))
+local keyword = token(l.KEYWORD, P('⍞') + P('χ') + P('⍺') + P('⍶')
+ + P('⍵') + P('⍹') + P('⎕') * R('AZ', 'az')^0)
+
+-- Names.
+local n1l = R('AZ', 'az')
+local n1b = P('_') + P('∆') + P('⍙')
+local n2l = n1l + R('09')
+local n2b = n1b + P('¯')
+local n1 = n1l + n1b
+local n2 = n2l + n2b
+local name = n1 * n2^0
--- Variables.
-local variable = token(l.VARIABLE, (l.alpha + S('_∆⍙')) * (l.alnum + S('_∆⍙¯')^0))
+-- Labels.
+local label = token(l.LABEL, name * P(':'))
--- Operators.
-local operator = token(l.OPERATOR, S('{}[]()←→'))
+-- Variables.
+local variable = token(l.VARIABLE, name)
--- Labels.
-local label = token(l.LABEL, l.alnum^1 * P(':'))
+-- Special.
+local special = token(l.TYPE, S('{}[]();') + P('←') + P('→') + P('◊'))
-- Nabla.
-local nabla = token('nabla', S('∇⍫'))
+local nabla = token(l.PREPROCESSOR, P('∇') + P('⍫'))
M._rules = {
{'whitespace', ws},
@@ -44,12 +61,8 @@ M._rules = {
{'keyword', keyword},
{'label', label},
{'variable', variable},
- {'operator', operator},
+ {'special', special},
{'nabla', nabla},
}
-M._tokenstyles = {
-
-}
-
return M |
@martanne: The man parser looks easy. I can probably knock that out on Saturday. |
The built-in search path for VIS_PATH should also include /usr/local/share or respect the PREFIX set in config.mk. |
Here's the man lexer: diff --git a/lexers/lexer.lua b/lexers/lexer.lua
index 871bd1d..b42dc26 100644
--- a/lexers/lexer.lua
+++ b/lexers/lexer.lua
@@ -1595,6 +1595,7 @@ function M.get_style(lexer, lang, token_name)
end
local files = {
+ [".1|.2|.3|.4|.5|.6|.7|.8|.9|.1x|.2x|.3x|.4x|.5x|.6x|.7x|.8x|.9x"] = "man",
[".as|.asc"] = "actionscript",
[".adb|.ads"] = "ada",
[".g|.g4"] = "antlr",
diff --git a/lexers/man.lua b/lexers/man.lua
new file mode 100644
index 0000000..d67fc5c
--- /dev/null
+++ b/lexers/man.lua
@@ -0,0 +1,35 @@
+-- ? LPeg lexer.
+
+local l = require('lexer')
+local token, word_match = l.token, l.word_match
+local P, R, S = lpeg.P, lpeg.R, lpeg.S
+
+local M = {_NAME = 'man'}
+
+-- Whitespace.
+local ws = token(l.WHITESPACE, l.space^1)
+
+-- Markup.
+local rule1 = token(l.STRING,
+ P('.') * (P('B') * P('R')^-1 + P('I') * P('PR')^-1) * l.nonnewline^0)
+local rule2 = token(l.NUMBER, P('.') * S('ST') * P('H') * l.nonnewline^0)
+local rule3 = token(l.KEYWORD,
+ P('.br') + P('.DS') + P('.RS') + P('.RE') + P('.PD'))
+local rule4 = token(l.LABEL, P('.') * (S('ST') * P('H') + P('.TP')))
+local rule5 = token(l.VARIABLE,
+ P('.B') * P('R')^-1 + P('.I') * S('PR')^-1 + P('.PP'))
+local rule6 = token(l.TYPE, P('\\f') * S('BIPR'))
+local rule7 = token(l.PREPROCESSOR, l.starts_line('.') * l.alpha^1)
+
+M._rules = {
+ {'whitespace', ws},
+ {'rule1', rule1},
+ {'rule2', rule2},
+ {'rule3', rule3},
+ {'rule4', rule4},
+ {'rule5', rule5},
+ {'rule6', rule6},
+ {'rule7', rule7},
+}
+
+return M
|
Thanks, applied!
Yes I agree, not yet sure whether I should always just hardcode |
For now, I added |
Are there any outstanding issues blocking a merge of the lua branch onto master? |
Better color themes? The question is also whether to do a release before the merge? |
Point taken regarding themes. See below for a theme that'll work on color-limited terminals. FWIW, I'd prefer to see the lua-based version be the release version. Better to get that in front of users, IMO, than to get them used to the regexp-based parser and make them switch later. I've found no behavioral regressions in the lua branch and noticed that the highlighting performance is much improved. |
Here's a theme that'll work on color-limited terminals:
|
Thanks, I applied it and added some code to select the theme based on the terminal capabilities. It can be overridden via the Having said that, I don't really like the low color theme. It has too much green for my taste and the comments should not be bold? Maybe we should just copy the default vim theme? If not for the theme issue, the lua branch could be merged into master (modulo bugs I introduced during the latest round of code shuffling). |
Thanks. Here's an alternate low-color theme that's more in the spirit of vim, at least as I see the defaults on my installation. Note that it's probably not possible to match all the details of vim's highlighting without changing parsers. For example, in a C file vim treats a system include path the same way as a string. There are differences in keyword recoginition, too. Also, my installation of vim uses colors not in the 16-color palette for its default theme. I intentionally deviated from vim in one respect: I color operators and punctuation cyan rather than leaving them the same white as identifiers. I prefer the visual distinction. -- Eight-color scheme
local lexers = vis.lexers
-- dark
lexers.STYLE_DEFAULT = 'back:black,fore:white'
lexers.STYLE_NOTHING = 'back:black'
lexers.STYLE_CLASS = 'fore:yellow'
lexers.STYLE_COMMENT = 'fore:blue'
lexers.STYLE_CONSTANT = 'fore:cyan'
lexers.STYLE_DEFINITION = 'fore:blue'
lexers.STYLE_ERROR = 'fore:red,italics'
lexers.STYLE_FUNCTION = 'fore:blue,bold'
lexers.STYLE_KEYWORD = 'fore:yellow'
lexers.STYLE_LABEL = 'fore:green'
lexers.STYLE_NUMBER = 'fore:red'
lexers.STYLE_OPERATOR = 'fore:cyan'
lexers.STYLE_REGEX = 'fore:green'
lexers.STYLE_STRING = 'fore:red'
lexers.STYLE_PREPROCESSOR = 'fore:magenta'
lexers.STYLE_TAG = 'fore:red'
lexers.STYLE_TYPE = 'fore:green'
lexers.STYLE_VARIABLE = 'fore:blue,bold'
lexers.STYLE_WHITESPACE = ''
lexers.STYLE_EMBEDDED = 'back:blue'
lexers.STYLE_IDENTIFIER = 'fore:white'
lexers.STYLE_LINENUMBER = 'fore:white'
lexers.STYLE_CURSOR = 'fore:red,back:white'
lexers.STYLE_CURSOR_LINE = 'back:white'
lexers.STYLE_COLOR_COLUMN = 'back:white'
-- lexers.STYLE_SELECTION = 'back:white'
lexers.STYLE_SELECTION = 'back:white'
|
I slightly tweaked the selection handling (not sure if it is better now) and merged it into master. Please give it a try. Special thanks to those who provided LPeg lexers! |
I assume "solarized terminal", refers to the solarized patch for st? I can't reproduce the issue with the git version of st and said patch, seems to work as expected. |
During the last couple of weeks I've been investigating a new mechanism to implement syntax highlighting. So far the best solution seems to be based on LPeg.
Advantages:
strictly more powerful than regular expressions and can thus handle certain constructs which weren't possible before
Disadvantages:
make standalone
is about ~600K large. I still consider that acceptable. I have no prior experience with Lua, but from what I've seen so far it seems to share a quite a few properties with vis, namely to be: simple, small and efficient.The current state can be found in the lua branch. There are likely still some problems, in particular nested lexers are not handled properly.
If we decide to go that route then there is the question which role Lua should play within vis. Should it just be an optional component enabling syntax highlighting? Or should it be tightly integrated and provide a sort of "plug-in" API? Embedding vs extending? In the extreme case, should vis be a lua script calling into an efficient C-based text manipulation library?
Comments?
The text was updated successfully, but these errors were encountered: