Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

escaping not working properly for regex #1339

Open
subbyte opened this issue Feb 21, 2023 · 6 comments
Open

escaping not working properly for regex #1339

subbyte opened this issue Feb 21, 2023 · 6 comments
Labels
bug Something isn't working

Comments

@subbyte
Copy link
Member

subbyte commented Feb 21, 2023

Describe the bug
stix-shifter v4.6.0 requires 4 backslash chars to match 1 backslash char in data for regex, which should not be expected.

To Reproduce
Add the two test cases in stix_shifter_modules/elastic_ecs/tests/stix_translation/test_elastic_ecs_stix_to_query.py:

    def test_exactmatch_escaped(self):
        stix_pattern = r"[artifact:payload_bin = 'C:\\Windows\\system32\\svchost\\.exe']"
        translated_query = translation.translate('elastic_ecs', 'query', '{}', stix_pattern)
        translated_query['queries'] = _remove_timestamp_from_query(translated_query['queries'])
        test_query = [r'event.original : "C:\\Windows\\system32\\svchost\\.exe"']
        _test_query_assertions(translated_query, test_query)

    def test_regexmatch_escaped(self):
        stix_pattern = r"[artifact:payload_bin MATCHES 'C:\\Windows\\system32\\svchost\\.exe']"
        #stix_pattern = r"[artifact:payload_bin MATCHES 'C:\\\\Windows\\\\system32\\\\svchost\\\\.exe']"
        translated_query = translation.translate('elastic_ecs', 'query', '{}', stix_pattern)
        translated_query['queries'] = _remove_timestamp_from_query(translated_query['queries'])
        test_query = [r'event.original : /C:\\Windows\\system32\\svchost\\.exe/']
        _test_query_assertions(translated_query, test_query)

Expected behavior
Both tests passed.

Error
The first passed, but not the second:

============================================== FAILURES ==============================================
______________________________ TestStixtoQuery.test_regexmatch_escaped _______________________________

self = <test_elastic_ecs_stix_to_query.TestStixtoQuery testMethod=test_regexmatch_escaped>

    def test_regexmatch_escaped(self):
        stix_pattern = r"[artifact:payload_bin MATCHES 'C:\\Windows\\system32\\svchost\\.exe']"
        #stix_pattern = r"[artifact:payload_bin MATCHES 'C:\\\\Windows\\\\system32\\\\svchost\\\\.exe']"
        translated_query = translation.translate('elastic_ecs', 'query', '{}', stix_pattern)
        translated_query['queries'] = _remove_timestamp_from_query(translated_query['queries'])
        test_query = [r'event.original : /C:\\Windows\\system32\\svchost\\.exe/']
>       _test_query_assertions(translated_query, test_query)

tests/stix_translation/test_elastic_ecs_stix_to_query.py:279: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

translated_query = {'queries': ['event.original : /C:[^a-zA-Z0-9_]indows[ \t\n\r\x0c\x0b]ystem32[ \t\n\r\x0c\x0b]vchost\\.exe/']}
test_query = ['event.original : /C:\\\\Windows\\\\system32\\\\svchost\\\\.exe/']

    def _test_query_assertions(translated_query, test_query):
>       assert translated_query['queries'] == test_query
E       AssertionError: assert ['event.origi...chost\\.exe/'] == ['event.origi...ost\\\\.exe/']
E         At index 0 diff: 'event.original : /C:[^a-zA-Z0-9_]indows[ \t\n\r\x0c\x0b]ystem32[ \t\n\r\x0c\x0b]vchost\\.exe/' != 'event.original : /C:\\\\Windows\\\\system32\\\\svchost\\\\.exe/'
E         Use -v to get more diff

tests/stix_translation/test_elastic_ecs_stix_to_query.py:15: AssertionError

If you uncomment the line

        #stix_pattern MATCHES r"[artifact:payload_bin = 'C:\\\\Windows\\\\system32\\\\svchost\\\\.exe']"

It will work, which indicate one needs to use 4 backslashes to represent 1 backslash in data.

Desktop (please complete the following information):

  • OS: macOS
  • Python: 3.9.6
  • stix-shifter: 4.6.0
@subbyte subbyte added the bug Something isn't working label Feb 21, 2023
@subbyte
Copy link
Member Author

subbyte commented Feb 21, 2023

I test stix2matcher and it has the same behavior.

I think this occurs since any string needs to escape \ to be \\, so regex needs to have \ be \\\\. But why does non-regex need to escape \? It makes sense to escape ' but why \?

The rule of string escaping of STIX pattern is a little unclear to me:

The escape character is the backslash ('' U+005c). Only the apostrophe or the backslash may follow, and in that case, the respective character is used for the sequence.

It only rules ' and \ can follow the escaping char \, but not say whether one needs to escape the data char \. Any comment, @JasonKeirstead ?

I started a vote in the kestrel channel for users to throw their opinions.

@MaxwellDPS
Copy link

MaxwellDPS commented Mar 3, 2023

Seeing this issue as well, if an apostrophe is in the indicator it will crash.

Example pattern: [url:value = 'https://legit.but.bad.practices.site/\'https://forsure.nota.malware/c2/\''] In memory is is escaped with \\[url:value = 'https://legit.but.bad.practices.site/\\'https://forsure.nota.malware/c2/\\'']

But this results in
line 1:0 mismatched input 'url' expecting {'(', '['} Caught exception: pop from empty list <class 'stix_shifter_utils.stix_translation.src.patterns.parser.ParserError'> received exception => ParserError: pop from empty list

I have tried 1 - 6 escapes, and no go

Thanks!

@subbyte
Copy link
Member Author

subbyte commented Mar 4, 2023

@MaxwellDPS I think you find another issue, related.

After some digging, I realize the issue I posted here is actually a design, not a bug. One can think the value string in STIX pattern the same grammar as Python string (with same escaping rules), so one really need to use \\\\ in regex to indicate \ char. The same happens in Python (the cited sentence below is at Python doc here):

for example, to match a literal backslash, one might have to write '\\\\' as the pattern string

So we cannot do anything to improve the situation unless STIX pattern redefines value string not as escaped string. I am considering to provide that (non-escaped string) as default Kestrel value string to make it simpler for users.

The problem you found is a real bug in stix-shifter implementation. Basically stix-shifter failed to parse escaped single quote (which is supposed to work with STIX pattern standard):

    # this test passes
    def test_urlvalue(self):
        stix_pattern = r"""[url:value = 'https://legit.but.bad.practices.site/']"""
        translated_query = translation.translate('elastic_ecs', 'query', '{}', stix_pattern)
        translated_query['queries'] = _remove_timestamp_from_query(translated_query['queries'])
        test_query = [r'''url.original : "https://legit.but.bad.practices.site/"''']
        _test_query_assertions(translated_query, test_query)

    # this test fails
    def test_singlequote_escaping(self):
        stix_pattern = r"""[url:value = 'https://legit.but.b\'ad.practices.site/']"""
        translated_query = translation.translate('elastic_ecs', 'query', '{}', stix_pattern)
        translated_query['queries'] = _remove_timestamp_from_query(translated_query['queries'])
        test_query = [r'''url.original : "https://legit.but.b'ad.practices.site/"''']
        _test_query_assertions(translated_query, test_query)

# log from pytest for the failed test
translated_query = {'queries': ['url.original : "https://legit.but.b\\\\\'ad.practices.site/"']}
test_query = ['url.original : "https://legit.but.b\'ad.practices.site/"']

    def _test_query_assertions(translated_query, test_query):
>       assert translated_query['queries'] == test_query
E       assert ['url.origina...tices.site/"'] == ['url.origina...tices.site/"']
E         At index 0 diff: 'url.original : "https://legit.but.b\\\\\'ad.practices.site/"' != 'url.original : "https://legit.but.b\'ad.practices.site/"'
E         Use -v to get more diff

stix_shifter_modules/elastic_ecs/tests/stix_translation/test_elastic_ecs_stix_to_query.py:15: AssertionError

@pcoccoli
Copy link
Contributor

Escaping is complicated; each module needs to implement the escaping rules of the data source they will query. E.g. for elasticsearch:

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

See https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html#_reserved_characters
This is after any Python or STIX escaping is done, before producing the final query.

@mdazam1942
Copy link
Member

The earlier implementation might have missed some constraints. Escaping values needs to be fixed in query constructor based on the datasource query constraints that @pcoccoli mentioned:

def _escape_value(value, comparator=None) -> str:

@subbyte
Copy link
Member Author

subbyte commented May 8, 2023

Add raw string support in Kestrel to resolve the four baskslash issue opencybersecurityalliance/kestrel-lang#329 .

However, the ' in string bug found by @MaxwellDPS is not fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants