vrl: null and unicode escapes not working in string literals #148

hhromic · 2022-03-21T21:36:38Z

A note for the community

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

I recently found out that the null and unicode character escape sequences are not working in string literal expressions in VRL.
The documentation makes sense, therefore I think something is off in the VRL parser.

Examples from the VRL REPL:

$ "hello\0world"
error[E202]: syntax error
  ┌─ :1:1
  │
1 │ "hello\0world"
  │ ^^^^^^^^^^^^^^ unexpected error: invalid escape character: \0
  │
  = see language documentation at https://vrl.dev

$ "hello\u{1F30E}world"  # also tested with \U{1F30E} and \u1F30E just in case
error[E202]: syntax error
  ┌─ :1:1
  │
1 │ "hello\u{1F30E}world"
  │ ^^^^^^^^^^^^^^^^^^^^^ unexpected error: invalid escape character: \u
  │
  = see language documentation at https://vrl.dev

All the other documented escape sequences seems to be working fine (although not all printed correctly):

$ "hello\nworld"
"hello\nworld"

$ "hello\"world"
"hello\"world"

$ "hello\'world"
"hello'world"

$ "hello\\world"
"hello\\world"

$ "hello\nworld"
"hello\nworld"

$ "hello\rworld"  # \r (carriage-return) was rendered instead of printed with an escape
world"

$ "hello\tworld"  # \t (tab) was rendered instead of printed with an escape
"hello  world"

Configuration

N/A

Version

vector 0.20.0 (x86_64-unknown-linux-gnu 2a706a3 2022-02-11)

Debug Output

No response

Example Data

No response

Additional Context

No response

References

No response

The text was updated successfully, but these errors were encountered:

hhromic · 2022-03-21T23:06:41Z

Looks like inded those escape characters are not implemented in the VRL lexer:

https://github.com/vectordotdev/vector/blob/7cb951daed57448b80463303a07184ed537d9c54/lib/vrl/parser/src/lex.rs#L1112-L1128

https://github.com/vectordotdev/vector/blob/7cb951daed57448b80463303a07184ed537d9c54/lib/vrl/parser/src/lex.rs#L1164-L1197

hhromic · 2022-03-21T23:17:27Z

After further investigation, it seems that these escape sequences used to work before, but they were not ported over in the VRL parser/compiler re-implementation PR vectordotdev/vector#6353.

jszwedko · 2022-03-22T21:56:04Z

Interesting, thanks for the sleuthing! cc/ @StephenWakely @JeanMertz for thoughts on this.

JeanMertz · 2022-03-23T04:54:25Z

This is correct. I didn't port them over, not for any particular reason, other than wanting to start out simple, and adding more escape sequences as requested by the community.

I think it makes sense to handle escape characters similar to how the Rust compiler handles them.

hhromic · 2022-03-23T10:23:39Z

I'm glad the omission was just for simplicity of the initial implementation instead of a limitation :)
And yes, having parity with Rust's escape character mechanism would be very nice and complete indeed.

In fairness, I don't think the missing escape characters I reported are a big use-case, I haven't seen anyone asking for them except for me in this issue after a long time since the new lexer was introduced.

The use-case for me that triggered discovering the missing escape characters is that I was testing my implementations of ip_ntop() and ip_pton() using the VRL REPL and I wasn't able to generate strings with arbitrary bytes. In other words, the escape I needed was the binary \xNN escape. Then the documentation pointed me to \uNNNNNN (which is not exactly the same but close-enough), which didn't work either.

EdN-terascope · 2023-02-23T19:17:47Z

In VRL, I'm having troubles with the limits Rust has for escaped characters and regex. PCRE2, Python or Golang style is preferable to Rust or ECMAScript here.

Thx

arthmoeros · 2023-04-28T21:13:14Z

Is anyone by any chances working on this?, I did try to add null escape in the mentioned new lexer, but I can't get it to work, still says \0 is an invalid escape.

I'm not very well versed on Rust so maybe I'm missing something, this is my fork, if anybody could point me in the right direction, any help would be appreciated.

main...arthmoeros:vrl:null_escape_lexer

We do have a use case where we need to remove null characters in a message, a workaround would also be welcome. In the meantime I will try with a lua transform.

StephenWakely · 2023-05-04T12:51:34Z

@arthmoeros Your code looks fine and works for me.

I'm curious how you are testing it? Perhaps you are still running the Vector project that is pulling VRL from Github and not from your local repo?

arthmoeros · 2023-05-04T13:14:17Z

@StephenWakely I compiled Vector changing every reference of VRL using my Github repo, I just pushed my fork of Vector where I made the change.

arthmoeros/vector@4f8eb9e

So maybe I'm doing something wrong on Vector compilation? (I dont really know what though hehe)

StephenWakely · 2023-05-04T14:07:28Z

@StephenWakely I compiled Vector changing every reference of VRL using my Github repo, I just pushed my fork of Vector where I made the change.

arthmoeros/vector@4f8eb9e

So maybe I'm doing something wrong on Vector compilation? (I dont really know what though hehe)

Ah, you want to specify branch rather than rev.

vrl = { package = "vrl", git = "https://github.com/arthmoeros/vrl", branch = "null_escape_lexer" }

An easier way to test directly in the VRL repo is to just run the cli project:

> cd lib/cli
> cargo run

arthmoeros · 2023-05-04T15:32:53Z

@StephenWakely I compiled Vector changing every reference of VRL using my Github repo, I just pushed my fork of Vector where I made the change.
arthmoeros/vector@4f8eb9e
So maybe I'm doing something wrong on Vector compilation? (I dont really know what though hehe)

Ah, you want to specify branch rather than rev.
vrl = { package = "vrl", git = "https://github.com/arthmoeros/vrl", branch = "null_escape_lexer" }
An easier way to test directly in the VRL repo is to just run the cli project:
> cd lib/cli
> cargo run

Wonderful, thanks for the tip on the cli project, worked like charm and indeed easier to test.

Just submitted the PR: #219

Thanks a lot for your help!

dss010101 · 2023-05-15T04:23:13Z

would this be the reason that this:

source = '''
    . |= parse_regex!(
            .message, 
            pattern = [r"\[(?P<log_time>.+)\]\[(?P<process>\d+):(?P<thread>\d+)\]\[(?P<severity>\w+)\]\[(?P<module>\w+)\.(?P<func>\w+)\] (?P<msg>.*$)"]
    )
    .log_time, err = parse_timestamp(.log_time, "%Y-%m-%d %H:%M:%S,%f")
    .process = to_int!(.process)
    .thread = to_int!(.thread)
'''

causes this exception:


        | 2023-05-15T04:21:51.724291Z  INFO vector::app: Loading configs. paths=["/etc/vector/vector.toml"]
        | 2023-05-15T04:21:51.747923Z ERROR vector::topology: Configuration error. error=Transform "log_parser":
        | error[E202]: syntax error
        |   ┌─ :1:1
        |   │
        | 1 │ ╭     . |= parse_regex!(
        | 2 │ │             .message,
        | 3 │ │             pattern = [r"\[(?P<log_time>.+)\]\[(?P<process>\d+):(?P<thread>\d+)\]\[(?P<severity>\w+)\]\[(?P<module>\w+)\.(?P<func>\w+)\] (?P<msg>.*$)"]
        | 4 │ │     )
        |   · │
        | 7 │ │     .thread = to_int!(.thread)
        | 8 │ │
        |   │ ╰^ unexpected error: invalid escape character: \[

Is there a workaround for this?

StephenWakely · 2023-05-15T11:00:18Z

Is there a workaround for this?

Hi. There are multiple issues with your code, none of which are related to this issue.

Named parameters use : - so use pattern: rather than pattern = for the pattern parameter.
Regex literals use r'..' rather than r".." - this is the primary cause the error you are getting.
The pattern parameter is a single regex, not an array - so don't enclose the pattern with [...].

This should work:

. |= parse_regex!(
            .message, 
            pattern: r'\[(?P<log_time>.+)\]\[(?P<process>\d+):(?P<thread>\d+)\]\[(?P<severity>\w+)\]\[(?P<module>\w+)\.(?P<func>\w+)\] (?P<msg>.*$)'
    )

dss010101 · 2023-05-15T17:20:20Z

Is there a workaround for this?

Hi. There are multiple issues with your code, none of which are related to this issue.

Named parameters use : - so use pattern: rather than pattern = for the pattern parameter.

Regex literals use r'..' rather than r".." - this is the primary cause the error you are getting.

The pattern parameter is a single regex, not an array - so don't enclose the pattern with [...].

This should work:
. |= parse_regex!(
            .message, 
            pattern: r'\[(?P<log_time>.+)\]\[(?P<process>\d+):(?P<thread>\d+)\]\[(?P<severity>\w+)\]\[(?P<module>\w+)\.(?P<func>\w+)\] (?P<msg>.*$)'
    )

ok thank you - now i know. i actually had success with the grok patterns. but have a question. In the below i have several parsing patterns.
if the 'msg' field i parse out may contain a pattern like 'app_id=app_1234', can i post parse and add 'app_id' as a field, rather than include another parser pattern for it as i do below? It seems it would be cleaner to post parse and add optional fields that may or may not be in the msg. Are there any examples of doing this?

source = '''
    . |= parse_groks!(
            .message, 
            patterns:[
                "\\[%{TIMESTAMP_ISO8601:log_time}\\]\\[%{INT:process}:%{INT:thread}\\]\\[%{WORD:severity}\\]\\[%{WORD:module}.%{WORD:func}\\] app_id:%{NOTSPACE:app_id} %{GREEDYDATA:msg}",
                "\\[%{TIMESTAMP_ISO8601:log_time}\\]\\[%{INT:process}:%{INT:thread}\\]\\[%{WORD:severity}\\]\\[%{WORD:module}.%{WORD:func}\\] %{GREEDYDATA:msg}",
                "\\[%{TIMESTAMP_ISO8601:log_time}\\]\\[%{INT:process}:%{INT:thread}\\]%{WORD:severity}:%{WORD:module}:%{GREEDYDATA:msg}"
            ]
    )

'''

example logs with/and without app_id:

[2023-05-15 11:52:29,401][2555:140431094425152][INFO][default.run] initalizing application
[2023-05-15 11:52:29,401][2555:140431094425152][INFO][default.run] app_id:my_app_12345 initialized.

dss010101 · 2023-05-16T04:24:54Z

single regex, not an array - so don't enclose the pattern with [...].

Is it only parse_groks that allow any array of patterns then? as im learning more - i read that grok internally uses regex anyway, so i thought it may be a bit more performant if i use parse_regex. but i have logs that have different formats...and wondering if there is a way to pass these different patterns to parse_regex. or do i have to have multiple parse_regex calls?

jszwedko · 2023-05-16T14:37:22Z

single regex, not an array - so don't enclose the pattern with [...].

Is it only parse_groks that allow any array of patterns then? as im learning more - i read that grok internally uses regex anyway, so i thought it may be a bit more performant if i use parse_regex. but i have logs that have different formats...and wondering if there is a way to pass these different patterns to parse_regex. or do i have to have multiple parse_regex calls?

👋 do you mind opening this as a separate discussion post since it is unrelated to this issue?

dss010101 · 2023-05-16T22:51:13Z

Done...sorry for hijacking threads...#246

polarathene · 2023-11-25T08:34:50Z

Just chiming in to mention that the docs example for strip_ansi_escape_codes!() fails with \e escape code. It suggests to use the VRL REPL which likewise fails:

$ vector vrl
$ strip_ansi_escape_codes("\e[46mfoo\e[0m bar")

error[E202]: syntax error
  ┌─ :1:1
  │
1 │ strip_ansi_escape_codes("\e[46mfoo\e[0m bar")
  │ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ unexpected error: invalid escape character: \e
  │
  = see language documentation at https://vrl.dev
  = try your code in the VRL REPL, learn more at https://vrl.dev/examples

In my case, I was actually trying to apply ANSI escape codes for coloured output in the console sink, which either outputs unescaped (or partial) ANSI codes as plain text, or fails with an error like above.

I wasn't sure if it was unsupported by the sink, but perhaps it's related to this issue (Vectors own internal log output leveraged colour without issue):

fingon · 2024-03-07T08:35:56Z

It is somewhat frustrating that I cannot cut-n-paste (json encoded, e.g. \u..) ANSI escape strings from logs to vector test so that I could ensure that my pipeline (correctly) handles their stripping. In regexps it is less important, I guess, although it would be nice to have it working there too.

DylanRJohnston · 2024-07-10T06:17:26Z

Should probably adjust the documentation on string expressions until support for these escape sequences is re-added https://vector.dev/docs/reference/vrl/expressions/#string-characteristics. I thought I was doing something wrong.

pront · 2025-01-14T19:29:00Z

Should probably adjust the documentation on string expressions until support for these escape sequences is re-added https://vector.dev/docs/reference/vrl/expressions/#string-characteristics. I thought I was doing something wrong.

Fair point. I will try to find some time this week.

ulidtko · 2025-02-18T17:56:26Z

Same applies to \x sequences — understood by the compiler, but syntax error in REPL.

I.e. this works in actual config:

.message = replace(.message, r'\x1b\[[0-9;]*m', "") # strip colors

But try testing that in the repl 😉 You'll be surprised (unpleasantly).

Should probably adjust the documentation

From the pointers given by @hhromic, implementing support in the lexer seems more productive than eternalizing the bug by documenting it.

I'm confused though, is the lexer of repl different from the one of compiler?.. That seems weird 🤔

zpriddy · 2025-02-26T16:53:48Z

I just ran across this issue this week. Logs were pulled from GCP Logging via PubSub. There were terminal color encodings that were included in the log itself.

I took some tips from another post for testing in the VRL console. Adding the log event to a file and running the VRL with --input as vector vrl --input file_with_events.json worked with the unicode in the event.

Also found out that using s'... subj=system_u:system_r:init_t:s0 key=(null)\u001dARCH=x86_64 SYSCALL= ... did work for stripping the escape..

For the time I just updated the agent to strip escapes before writing the logs for the aggregator to pick up..

hhromic added the type: bug A code related bug label Mar 21, 2022

JeanMertz added vrl: parser Changes to the syntax parser type: enhancement A value-adding code change that enhances its existing functionality and removed type: bug A code related bug labels Jun 7, 2022

hhromic changed the title ~~VRL: null and unicode escapes not working in string literals~~ vrl: null and unicode escapes not working in string literals Sep 13, 2022

jszwedko added type: bug A code related bug and removed type: enhancement A value-adding code change that enhances its existing functionality labels Mar 9, 2023

fuchsnj transferred this issue from vectordotdev/vector Mar 28, 2023

bruceg mentioned this issue Apr 17, 2023

matching "\0" in message is invalid vectordotdev/vector#17158

Closed

StephenWakely mentioned this issue May 4, 2023

added supported null \0 escape character to new lexer #219

Closed

arthmoeros mentioned this issue May 26, 2023

Added null escape to lexer #259

Merged

alex-rowe mentioned this issue Oct 10, 2023

Parsing auditd logs has issues due to a 0x1D character #502

Open

jszwedko mentioned this issue Jul 26, 2024

can not split message by \u0001 vectordotdev/vector#20884

Closed

jszwedko mentioned this issue Jan 14, 2025

Unicode character escaping doesn't work #1220

Closed

u1735067 mentioned this issue Feb 18, 2025

VRL REPL doesn't parse "\u" neither "\x" escapes #1275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vrl: null and unicode escapes not working in string literals #148

vrl: null and unicode escapes not working in string literals #148

hhromic commented Mar 21, 2022

hhromic commented Mar 21, 2022

hhromic commented Mar 21, 2022

jszwedko commented Mar 22, 2022

JeanMertz commented Mar 23, 2022

hhromic commented Mar 23, 2022

EdN-terascope commented Feb 23, 2023

arthmoeros commented Apr 28, 2023 •

edited

Loading

StephenWakely commented May 4, 2023

arthmoeros commented May 4, 2023

StephenWakely commented May 4, 2023

arthmoeros commented May 4, 2023

dss010101 commented May 15, 2023

StephenWakely commented May 15, 2023

dss010101 commented May 15, 2023

dss010101 commented May 16, 2023 •

edited

Loading

jszwedko commented May 16, 2023

dss010101 commented May 16, 2023

polarathene commented Nov 25, 2023

fingon commented Mar 7, 2024 •

edited

Loading

DylanRJohnston commented Jul 10, 2024

pront commented Jan 14, 2025

ulidtko commented Feb 18, 2025

zpriddy commented Feb 26, 2025

vrl: null and unicode escapes not working in string literals #148

vrl: null and unicode escapes not working in string literals #148

Comments

hhromic commented Mar 21, 2022

A note for the community

Problem

Configuration

Version

Debug Output

Example Data

Additional Context

References

hhromic commented Mar 21, 2022

hhromic commented Mar 21, 2022

jszwedko commented Mar 22, 2022

JeanMertz commented Mar 23, 2022

hhromic commented Mar 23, 2022

EdN-terascope commented Feb 23, 2023

arthmoeros commented Apr 28, 2023 • edited Loading

StephenWakely commented May 4, 2023

arthmoeros commented May 4, 2023

StephenWakely commented May 4, 2023

arthmoeros commented May 4, 2023

dss010101 commented May 15, 2023

StephenWakely commented May 15, 2023

dss010101 commented May 15, 2023

dss010101 commented May 16, 2023 • edited Loading

jszwedko commented May 16, 2023

dss010101 commented May 16, 2023

polarathene commented Nov 25, 2023

fingon commented Mar 7, 2024 • edited Loading

DylanRJohnston commented Jul 10, 2024

pront commented Jan 14, 2025

ulidtko commented Feb 18, 2025

zpriddy commented Feb 26, 2025

arthmoeros commented Apr 28, 2023 •

edited

Loading

dss010101 commented May 16, 2023 •

edited

Loading

fingon commented Mar 7, 2024 •

edited

Loading