Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UTF8 string / regex issue #326

Open
apismensky opened this issue Jan 24, 2025 · 0 comments
Open

UTF8 string / regex issue #326

apismensky opened this issue Jan 24, 2025 · 0 comments

Comments

@apismensky
Copy link

apismensky commented Jan 24, 2025

Regex: \x{ff15}\x{ff10}\x{ff17}\x{ff15}\x{ff10}[\x{ff10}-\x{ff19}]{7}
Input string: 507507832401
Matches in regex101, but does not match with vectorscan.
Step to reproduce:

TEST(utf8, charclass) {
    vector<pattern> patterns;
    patterns.push_back(pattern(R"(\x{ff15}\x{ff10}\x{ff17}\x{ff15}\x{ff10}[\x{ff10}-\x{ff19}]{7})", HS_FLAG_DOTALL | HS_FLAG_PREFILTER | HS_FLAG_MULTILINE | HS_FLAG_CASELESS | HS_FLAG_UCP | HS_FLAG_UTF8, 1));
    const char *data = "507507832401";

    hs_database_t *db = buildDB(patterns, HS_MODE_NOSTREAM);
    ASSERT_NE(nullptr, db);

    hs_scratch_t *scratch = nullptr;
    hs_error_t err = hs_alloc_scratch(db, &scratch);
    ASSERT_EQ(HS_SUCCESS, err);

    CallBackContext c;
    err = hs_scan(db, data, strlen(data), 0, scratch, record_cb,
                  (void *)&c);
    ASSERT_EQ(HS_SUCCESS, err);

    EXPECT_EQ(1, countMatchesById(c.matches, 1));
    err = hs_free_scratch(scratch);
    ASSERT_EQ(HS_SUCCESS, err);
    hs_free_database(db);
}

It fails when I run it locally (MacOS M1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant