Skip to content

Commit

Permalink
[analyzer] Limit isTainted() by skipping complicated symbols (llvm#…
Browse files Browse the repository at this point in the history
…105493)

As discussed in

https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10

Some `isTainted()` queries can blow up the analysis times, and
effectively halt the analysis under specific workloads.

We don't really have the time now to do a caching re-implementation of
`isTainted()`, so we need to workaround the case.

The workaround with the smallest blast radius was to limit what symbols
`isTainted()` does the query (by walking the SymExpr). So far, the
threshold 10 worked for us, but this value can be overridden using the
"max-tainted-symbol-complexity" config value.

This new option is "deprecated" from the getgo, as I expect this issue
to be fixed within the next few months and I don't want users to
override this value anyways. If they do, this message will let them know
that they are on their own, and the next release may break them (as we
no longer recognize this option if we drop it).

Mitigates llvm#89720

CPP-5414

(cherry picked from commit 8486589)
  • Loading branch information
steakhal authored and tru committed Sep 1, 2024
1 parent 5457983 commit 78f97e2
Show file tree
Hide file tree
Showing 5 changed files with 66 additions and 1 deletion.
5 changes: 5 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1407,6 +1407,11 @@ Crash and bug fixes
- Z3 crosschecking (aka. Z3 refutation) is now bounded, and can't consume
more total time than the eymbolic execution itself. (#GH97298)

- In clang-18, we regressed in terms of analysis time for projects having many
nested loops with buffer indexing or shifting or other binary operations.
For example, functions computing different hash values. Some of this slowdown
was attributed to taint analysis, which is fixed now. (#GH105493)

- ``std::addressof``, ``std::as_const``, ``std::forward``,
``std::forward_like``, ``std::move``, ``std::move_if_noexcept``, are now
modeled just like their builtin counterpart. (#GH94193)
Expand Down
5 changes: 5 additions & 0 deletions clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,11 @@ ANALYZER_OPTION(
ANALYZER_OPTION(unsigned, MaxSymbolComplexity, "max-symbol-complexity",
"The maximum complexity of symbolic constraint.", 35)

// HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570
// Ideally, we should get rid of this option soon.
ANALYZER_OPTION(unsigned, MaxTaintedSymbolComplexity, "max-tainted-symbol-complexity",
"[DEPRECATED] The maximum complexity of a symbol to carry taint", 9)

ANALYZER_OPTION(unsigned, MaxTimesInlineLarge, "max-times-inline-large",
"The maximum times a large function could be inlined.", 32)

Expand Down
7 changes: 7 additions & 0 deletions clang/lib/StaticAnalyzer/Checkers/Taint.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@

#include "clang/StaticAnalyzer/Checkers/Taint.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
#include <optional>

Expand Down Expand Up @@ -256,6 +257,12 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
if (!Sym)
return TaintedSymbols;

// HACK:https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570
if (const auto &Opts = State->getAnalysisManager().getAnalyzerOptions();
Sym->computeComplexity() > Opts.MaxTaintedSymbolComplexity) {
return {};
}

// Traverse all the symbols this symbol depends on to see if any are tainted.
for (SymbolRef SubSym : Sym->symbols()) {
if (!isa<SymbolData>(SubSym))
Expand Down
1 change: 1 addition & 0 deletions clang/test/Analysis/analyzer-config.c
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@
// CHECK-NEXT: max-inlinable-size = 100
// CHECK-NEXT: max-nodes = 225000
// CHECK-NEXT: max-symbol-complexity = 35
// CHECK-NEXT: max-tainted-symbol-complexity = 9
// CHECK-NEXT: max-times-inline-large = 32
// CHECK-NEXT: min-cfg-size-treat-functions-as-large = 14
// CHECK-NEXT: mode = deep
Expand Down
49 changes: 48 additions & 1 deletion clang/test/Analysis/taint-generic.c
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ void clang_analyzer_isTainted_char(char);
void clang_analyzer_isTainted_wchar(wchar_t);
void clang_analyzer_isTainted_charp(char*);
void clang_analyzer_isTainted_int(int);
void clang_analyzer_dump_int(int);

int coin();

Expand Down Expand Up @@ -459,7 +460,53 @@ unsigned radar11369570_hanging(const unsigned char *arr, int l) {
longcmp(a, t, c);
l -= 12;
}
return 5/a; // expected-warning {{Division by a tainted value, possibly zero}}
return 5/a; // FIXME: Should be a "div by tainted" warning here.
}

// This computation used to take a very long time.
void complex_taint_queries(const int *p) {
int tainted = 0;
scanf("%d", &tainted);

// Make "tmp" tainted.
int tmp = tainted + tainted;
clang_analyzer_isTainted_int(tmp); // expected-warning{{YES}}

// Make "tmp" SymExpr a lot more complicated by applying computation.
// This should balloon the symbol complexity.
tmp += p[0] + p[0];
tmp += p[1] + p[1];
tmp += p[2] + p[2];
clang_analyzer_dump_int(tmp); // expected-warning{{((((conj_}} symbol complexity: 8
clang_analyzer_isTainted_int(tmp); // expected-warning{{YES}}

tmp += p[3] + p[3];
clang_analyzer_dump_int(tmp); // expected-warning{{(((((conj_}} symbol complexity: 10
clang_analyzer_isTainted_int(tmp); // expected-warning{{NO}} 10 is already too complex to be traversed

tmp += p[4] + p[4];
tmp += p[5] + p[5];
tmp += p[6] + p[6];
tmp += p[7] + p[7];
tmp += p[8] + p[8];
tmp += p[9] + p[9];
tmp += p[10] + p[10];
tmp += p[11] + p[11];
tmp += p[12] + p[12];
tmp += p[13] + p[13];
tmp += p[14] + p[14];
tmp += p[15] + p[15];

// The SymExpr still holds the full history of the computation, yet, "isTainted" doesn't traverse the tree as the complexity is over the threshold.
clang_analyzer_dump_int(tmp);
// expected-warning@-1{{(((((((((((((((((conj_}} symbol complexity: 34
clang_analyzer_isTainted_int(tmp); // expected-warning{{NO}} FIXME: Ideally, this should still result in "tainted".

// By making it even one step more complex, then it would hit the "max-symbol-complexity"
// threshold and the engine would cut the SymExpr and replace it by a new conjured symbol.
tmp += p[16];
clang_analyzer_dump_int(tmp); // expected-warning{{conj_}} symbol complexity: 1
clang_analyzer_isTainted_int(tmp); // expected-warning{{NO}}
}

// Check that we do not assert of the following code.
Expand Down

0 comments on commit 78f97e2

Please sign in to comment.