-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vrl: null and unicode escapes not working in string literals #148
Comments
Looks like inded those escape characters are not implemented in the VRL lexer: |
After further investigation, it seems that these escape sequences used to work before, but they were not ported over in the VRL parser/compiler re-implementation PR vectordotdev/vector#6353. |
Interesting, thanks for the sleuthing! cc/ @StephenWakely @JeanMertz for thoughts on this. |
This is correct. I didn't port them over, not for any particular reason, other than wanting to start out simple, and adding more escape sequences as requested by the community. I think it makes sense to handle escape characters similar to how the Rust compiler handles them. |
I'm glad the omission was just for simplicity of the initial implementation instead of a limitation :) In fairness, I don't think the missing escape characters I reported are a big use-case, I haven't seen anyone asking for them except for me in this issue after a long time since the new lexer was introduced. The use-case for me that triggered discovering the missing escape characters is that I was testing my implementations of |
In VRL, I'm having troubles with the limits Rust has for escaped characters and regex. PCRE2, Python or Golang style is preferable to Rust or ECMAScript here. Thx |
Is anyone by any chances working on this?, I did try to add null escape in the mentioned new lexer, but I can't get it to work, still says I'm not very well versed on Rust so maybe I'm missing something, this is my fork, if anybody could point me in the right direction, any help would be appreciated. main...arthmoeros:vrl:null_escape_lexer We do have a use case where we need to remove null characters in a message, a workaround would also be welcome. In the meantime I will try with a lua transform. |
@arthmoeros Your code looks fine and works for me. I'm curious how you are testing it? Perhaps you are still running the Vector project that is pulling VRL from Github and not from your local repo? |
@StephenWakely I compiled Vector changing every reference of VRL using my Github repo, I just pushed my fork of Vector where I made the change. So maybe I'm doing something wrong on Vector compilation? (I dont really know what though hehe) |
Ah, you want to specify vrl = { package = "vrl", git = "https://github.com/arthmoeros/vrl", branch = "null_escape_lexer" } An easier way to test directly in the VRL repo is to just run the > cd lib/cli
> cargo run |
Wonderful, thanks for the tip on the cli project, worked like charm and indeed easier to test. Just submitted the PR: #219 Thanks a lot for your help! |
would this be the reason that this:
causes this exception:
Is there a workaround for this? |
Hi. There are multiple issues with your code, none of which are related to this issue.
This should work: . |= parse_regex!(
.message,
pattern: r'\[(?P<log_time>.+)\]\[(?P<process>\d+):(?P<thread>\d+)\]\[(?P<severity>\w+)\]\[(?P<module>\w+)\.(?P<func>\w+)\] (?P<msg>.*$)'
) |
ok thank you - now i know. i actually had success with the grok patterns. but have a question. In the below i have several parsing patterns.
example logs with/and without app_id:
|
Is it only parse_groks that allow any array of patterns then? as im learning more - i read that grok internally uses regex anyway, so i thought it may be a bit more performant if i use parse_regex. but i have logs that have different formats...and wondering if there is a way to pass these different patterns to parse_regex. or do i have to have multiple parse_regex calls? |
👋 do you mind opening this as a separate discussion post since it is unrelated to this issue? |
Done...sorry for hijacking threads...#246 |
Just chiming in to mention that the docs example for
$ vector vrl
$ strip_ansi_escape_codes("\e[46mfoo\e[0m bar")
error[E202]: syntax error
┌─ :1:1
│
1 │ strip_ansi_escape_codes("\e[46mfoo\e[0m bar")
│ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unexpected error: invalid escape character: \e
│
= see language documentation at https://vrl.dev
= try your code in the VRL REPL, learn more at https://vrl.dev/examples In my case, I was actually trying to apply ANSI escape codes for coloured output in the I wasn't sure if it was unsupported by the sink, but perhaps it's related to this issue (Vectors own internal log output leveraged colour without issue): |
It is somewhat frustrating that I cannot cut-n-paste (json encoded, e.g. \u..) ANSI escape strings from logs to vector test so that I could ensure that my pipeline (correctly) handles their stripping. In regexps it is less important, I guess, although it would be nice to have it working there too. |
Should probably adjust the documentation on string expressions until support for these escape sequences is re-added https://vector.dev/docs/reference/vrl/expressions/#string-characteristics. I thought I was doing something wrong. |
Fair point. I will try to find some time this week. |
Same applies to I.e. this works in actual config:
But try testing that in the repl 😉 You'll be surprised (unpleasantly).
From the pointers given by @hhromic, implementing support in the lexer seems more productive than eternalizing the bug by documenting it. I'm confused though, is the lexer of repl different from the one of compiler?.. That seems weird 🤔 |
I just ran across this issue this week. Logs were pulled from GCP Logging via PubSub. There were terminal color encodings that were included in the log itself. I took some tips from another post for testing in the VRL console. Adding the log event to a file and running the VRL with Also found out that using For the time I just updated the agent to strip escapes before writing the logs for the aggregator to pick up.. |
A note for the community
Problem
I recently found out that the null and unicode character escape sequences are not working in string literal expressions in VRL.
The documentation makes sense, therefore I think something is off in the VRL parser.
Examples from the VRL REPL:
All the other documented escape sequences seems to be working fine (although not all printed correctly):
Configuration
Version
vector 0.20.0 (x86_64-unknown-linux-gnu 2a706a3 2022-02-11)
Debug Output
No response
Example Data
No response
Additional Context
No response
References
No response
The text was updated successfully, but these errors were encountered: