Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nested ncr() calls can be very slow (potential denial of service) #62

Open
invd opened this issue Jun 25, 2020 · 4 comments · May be fixed by #82
Open

nested ncr() calls can be very slow (potential denial of service) #62

invd opened this issue Jun 25, 2020 · 4 comments · May be fixed by #82

Comments

@invd
Copy link

invd commented Jun 25, 2020

Issue description

I've recently discovered via fuzzing with libFuzzer that nested usage of the ncr() expression can result in unexpectedly long expression evaluation times.

The smallest representation of the problematic expression that I could find so far, as generated by libFuzzer:
ncr(ncr(2,7),1)
I measured around ~60s runtime at -O3 (with fuzzing instrumentation overhead) on a modern AMD Ryzen CPU.

This may be a low-severity security issue for projects that evaluate externally provided expression input.

Debug information:

The undefined behavior sanitizer also complains about the 'unsigned int' range, here are some variants:

tinyexpr.c:127:23: runtime error: nan is outside the range of representable values of type 'unsigned int'
tinyexpr.c:139:28: runtime error: -nan is outside the range of representable values of type 'unsigned int'
tinyexpr.c:139:28: runtime error: nan is outside the range of representable values of type 'unsigned int'
tinyexpr.c:139:52: runtime error: -nan is outside the range of representable values of type 'unsigned int'
tinyexpr.c:139:52: runtime error: nan is outside the range of representable values of type 'unsigned int'

Some gdb debug info for the above mentioned ncr(ncr(2,7),1) case where this happens:

Thread 1 "tinyexpr_fuzzer" hit Breakpoint 1, ncr
(n=nan(0x8000000000000), r=1) at tinyexpr.c:139
139	    unsigned long int un = (unsigned int)(n), ur = (unsigned int)(r), i;
(gdb) bt
#0  ncr (n=nan(0x8000000000000), r=1) at tinyexpr.c:139
#1  0x000000000055d8fe in te_eval (n=0x603000000100) at tinyexpr.c:532
#2  0x0000000000569940 in optimize (n=0x603000000100) at tinyexpr.c:580
#3  0x00000000005681b9 in te_compile (expression=0x7fffffffd780 "ncr(ncr(2,7),1)", variables=0x0, var_count=0, error=0x0) at tinyexpr.c:606
#4  0x0000000000569d6b in te_interp (expression=0x7fffffffd780 "ncr(ncr(2,7),1)", error=0x0) at tinyexpr.c:614
#5  0x000000000055366b in LLVMFuzzerTestOneInput (data=0x6020000000b0 "ncr(ncr(2,7),1)", size=15) at tinyexpr_fuzzer.c:29
(gdb) print r
$1 = 1
(gdb) print n
$2 = nan(0x8000000000000)
(gdb) print i
$3 = 105690555220240

Disclosure information

I have privately reported this issue to @codeplea. He asked me to document it via a public Github issue.

@codeplea
Copy link
Owner

Thanks for the issue.

I don't think it's possible to add a timeout option (without bringing in external dependencies, which I won't do), so I'll probably address this in two ways:

  1. Add a compile-time option to disable the functions that use loops (fac, ncr, npr, possibly others).
  2. Document it in the readme.

@mortie
Copy link
Contributor

mortie commented Oct 12, 2020

Well, it would be possible to add a timeout, albeit not in a very nice way. You could make a tick function which calls time() and compares it to a start time, then bails if the elapsed time is greater than a timeout, then sprinkle in calls to tick() in strategic locations throughout the code. For example, you could call tick() at the top of te_eval and every nth iteration in the loop-based functions.

The balance between performance and accuracy would be super hard to get right, but it would be a useful feature.

@juntuu juntuu linked a pull request Jul 26, 2021 that will close this issue
@juntuu
Copy link

juntuu commented Jul 26, 2021

The cause seems to be ncr(NAN, 1), which goes into pathologically slowly advancing loop for some reason.

Printing the values with %lu format just before the loop, I got:

un = 0
ur = 18446744073709551615

Seeing, that the project is now using c99, isnan should be available, so checking the nan should be reasonable mitigation on this issue. I opened a PR #82 adding those.

@invd
Copy link
Author

invd commented Jun 18, 2024

@codeplea it would be great to see this bug finally fixed via @juntuu 's #82 after 4+ years.

This will also help with @DavidKorczynski 's efforts for #114, since the GitHub fuzzing action jobs will otherwise spend a lot of time on timeouts.

Basic reproducer based on example.c, can be used under the project's zlib license:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

#include "tinyexpr.h"

int main(int argc, char *argv[])
{
    const char *c = "ncr(ncr(2,7),1)";
    double r = te_interp(c, 0);
    printf("the execution of this calculation should be slow");
    return 0;
}

For completeness, here is the fuzzer harness code I used to find the issue:

#include <stdio.h>
#include <stdint.h>
#include <string.h>

#include "tinyexpr.h"

#define FUZZER_MAX_INPUT_LENGTH 512

int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  
    if(size > FUZZER_MAX_INPUT_LENGTH) {
      return 0;
    }

    char inputbuffer[FUZZER_MAX_INPUT_LENGTH + 1] = {0};
    memcpy(inputbuffer, data, size);
    // null-terminate
    inputbuffer[size] = 0;
    
    // call target
    double r = te_interp(inputbuffer, 0);
    
    return 0;
}

I also make this available under zlib license so it can be used in the project.

Compared with the fuzzer harness proposed in #114, it has a restricted max length through fixed stack buffer usage (-> less heap allocations), and doesn't pre-define expression variables. Otherwise it's fairly similar.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants