Skip to content

How can I build a regex pattern with sequences of non utf-8 bytes? #1253

Answered by BurntSushi
johomo asked this question in Q&A
Discussion options

You must be logged in to vote

This effecively means that Regex cannot be used to find a sequence of utf8-invalid bytes in a haystack of bytes.

That's incorrect. Here's a counter-example:

use regex::bytes::Regex;

fn main() {
    let haystack = b"foo bar\xFF baz";
    let re = Regex::new(r"(?-u)\xFF").unwrap();
    assert_eq!(re.find(haystack).map(|r| r.range()), Some(7..8));
}

Playground link.

I feel I am missing something big.

You might have missed this section in the docs: https://docs.rs/regex/latest/regex/bytes/index.html#syntax

The relevant sections there are:

The u flag can be disabled even when disabling it might cause the regex to match invalid UTF-8. When the u flag is disabled, the regex is said to be i…

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@johomo
Comment options

@BurntSushi
Comment options

Answer selected by BurntSushi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants