-
Notifications
You must be signed in to change notification settings - Fork 11.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changing ArrayBoundV2 callback type increased symbol complexity #89720
Comments
@steakhal By the way, did you happen to have I'm asking this because one very significant effect of my commit #72107 If V1 was enabled in the bisection, then perhaps the observed increase in the complexity is just caused by the fact that previously the V2 checker didn't see those complex symbols (because the V1 checker found and handled them first). (I don't think that this is the most likely explanation of the situation, but it's worth to mention and check.) |
@llvm/issue-subscribers-clang-static-analyzer Author: None (NagyDonat)
While investigating #89045 @steakhal noticed that the complexity of symbols seen by the ArrayBoundV2 checker increased significantly when that checker switched to using `check::PostStmt` callbacks instead of the previously used `check::Location`.
> [...] prior to this change, the maximum Sym->computeComplexity() within getTaintedSymbolsImpl was significantly lower [4-5] than after the change. After the change this maximal complexity was more around the threshold (35), 30-33. That slowdown was fixed by #89606 which restored the efficiency of |
I believe we only use/used OOBv2. |
Thanks, that's what I guessed. |
I spent some time investigating this issue and I would summarize my research as "I still don't know the answer, but I realized that the question is not interesting". The increased symbol complexity appeared after my old change #72107 "[analyzer] Switch to PostStmt callbacks in ArrayBoundV2" moved the activation of ArrayBoundV2 to a different point within the process of the path-sensitive analysis -- so the natural conclusion is that those more complex symbols were always there at that different step of the process. (If we are measuring river pollution and notice that the pollution increased when we started to take samples from a different location, the natural conclusion is that the new spot is simply more polluted.) I have several rough conjectures that could lead to this difference between the
I wasn't able to understand exactly what happens between the As the investigation is difficult, the complex symbols don't cause any concrete problems (after #89606) and there is probably an innocent explanation, I don't think that it's worth to spend more time on this issue. @steakhal What do you think about this? Woudl you agree with closing this ticket? |
Thanks for your time. I think what you say makes sense. I didn't see yet glaring drawbacks of having these "complicated symbols" other than sometimes their dump took 26 megabytes. Given that ATM I'm not focusing on this checker, I'm fine with closing this. |
As discussed in https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10 Some `isTainted()` queries can blow up the analysis times, and effectively halt the analysis under specific workloads. We don't really have the time now to do a caching re-implementation of `isTainted()`, so we need to workaround the case. The workaround with the smallest blast radius was to limit what symbols `isTainted()` does the query (by walking the SymExpr). So far, the threshold 10 worked for us, but this value can be overridden using the "max-tainted-symbol-complexity" config value. This new option is "deprecated" from the getgo, as I expect this issue to be fixed within the next few months and I don't want users to override this value anyways. If they do, this message will let them know that they are on their own, and the next release may break them (as we no longer recognize this option if we drop it). Mitigates #89720 CPP-5414
…105493) As discussed in https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10 Some `isTainted()` queries can blow up the analysis times, and effectively halt the analysis under specific workloads. We don't really have the time now to do a caching re-implementation of `isTainted()`, so we need to workaround the case. The workaround with the smallest blast radius was to limit what symbols `isTainted()` does the query (by walking the SymExpr). So far, the threshold 10 worked for us, but this value can be overridden using the "max-tainted-symbol-complexity" config value. This new option is "deprecated" from the getgo, as I expect this issue to be fixed within the next few months and I don't want users to override this value anyways. If they do, this message will let them know that they are on their own, and the next release may break them (as we no longer recognize this option if we drop it). Mitigates llvm#89720 CPP-5414
…105493) As discussed in https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10 Some `isTainted()` queries can blow up the analysis times, and effectively halt the analysis under specific workloads. We don't really have the time now to do a caching re-implementation of `isTainted()`, so we need to workaround the case. The workaround with the smallest blast radius was to limit what symbols `isTainted()` does the query (by walking the SymExpr). So far, the threshold 10 worked for us, but this value can be overridden using the "max-tainted-symbol-complexity" config value. This new option is "deprecated" from the getgo, as I expect this issue to be fixed within the next few months and I don't want users to override this value anyways. If they do, this message will let them know that they are on their own, and the next release may break them (as we no longer recognize this option if we drop it). Mitigates llvm#89720 CPP-5414 (cherry picked from commit 8486589)
…105493) As discussed in https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10 Some `isTainted()` queries can blow up the analysis times, and effectively halt the analysis under specific workloads. We don't really have the time now to do a caching re-implementation of `isTainted()`, so we need to workaround the case. The workaround with the smallest blast radius was to limit what symbols `isTainted()` does the query (by walking the SymExpr). So far, the threshold 10 worked for us, but this value can be overridden using the "max-tainted-symbol-complexity" config value. This new option is "deprecated" from the getgo, as I expect this issue to be fixed within the next few months and I don't want users to override this value anyways. If they do, this message will let them know that they are on their own, and the next release may break them (as we no longer recognize this option if we drop it). Mitigates llvm#89720 CPP-5414 (cherry picked from commit 8486589)
…105493) As discussed in https://discourse.llvm.org/t/rfc-make-istainted-and-complex-symbols-friends/79570/10 Some `isTainted()` queries can blow up the analysis times, and effectively halt the analysis under specific workloads. We don't really have the time now to do a caching re-implementation of `isTainted()`, so we need to workaround the case. The workaround with the smallest blast radius was to limit what symbols `isTainted()` does the query (by walking the SymExpr). So far, the threshold 10 worked for us, but this value can be overridden using the "max-tainted-symbol-complexity" config value. This new option is "deprecated" from the getgo, as I expect this issue to be fixed within the next few months and I don't want users to override this value anyways. If they do, this message will let them know that they are on their own, and the next release may break them (as we no longer recognize this option if we drop it). Mitigates llvm#89720 CPP-5414
While investigating #89045 @steakhal noticed that the complexity of symbols seen by the ArrayBoundV2 checker increased significantly when that checker switched to using
check::PostStmt
callbacks instead of the previously usedcheck::Location
.That slowdown was fixed by #89606 which restored the efficiency of
getTaintedSymbolsImpl
and ensured that it runs quickly even when the symbols are complex. However, it would be still good to investigate the cause of this complexity increase.The text was updated successfully, but these errors were encountered: