Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[analyzer] CDM::CXXMethod mode CallDescription matching causes serious regressions and FNs #99628

Closed
steakhal opened this issue Jul 19, 2024 · 10 comments

Comments

@steakhal
Copy link
Contributor

Turns out the recent (#90974) CallDescription matching improvements can cause regressions when using standard library implementations where method we want to match comes from an implementation specific base class, such as in this example:
https://compiler-explorer.com/z/ErMbhxfPv

unsigned int sleep(unsigned int seconds);
namespace std {
class __mutex_base {
public:
  void lock();
};
class mutex : public __mutex_base{
public:
  void unlock();
  bool try_lock();
};
}

void top() {
  std::mutex m;
  m.lock();
  sleep(3); // caught by clang-18, FN for clang-19
  m.unlock();
}

In this case, the BlockInCriticalSectionChecker wants to match the std::mutex::lock and std::mutex::unlock member functions using the CDM::CXXMethod matching mode. However, one could only match the lock by std::__mutex_base::lock, causing a mismatch.

This issue is more generic and not entirely specific to this checker. I expect similar bugs to happen for any other checker using CDM::CXXMethod.

If we don't do about this, we are going to have a serious regression in the clang-19 release.

@llvmbot
Copy link
Member

llvmbot commented Jul 19, 2024

@llvm/issue-subscribers-clang-static-analyzer

Author: Balazs Benics (steakhal)

Turns out the recent (#90974) `CallDescription` matching improvements can cause regressions when using standard library implementations where method we want to match comes from an implementation specific base class, such as in this example: https://compiler-explorer.com/z/ErMbhxfPv ```c++ unsigned int sleep(unsigned int seconds); namespace std { class __mutex_base { public: void lock(); }; class mutex : public __mutex_base{ public: void unlock(); bool try_lock(); }; }

void top() {
std::mutex m;
m.lock();
sleep(3); // caught by clang-18, FN for clang-19
m.unlock();
}


In this case, the `BlockInCriticalSectionChecker` wants to match the `std::mutex::lock` and `std::mutex::unlock` member functions using the `CDM::CXXMethod` matching mode. However, one could only match the `lock` by `std::__mutex_base::lock`, causing a mismatch.

This issue is more generic and not entirely specific to this checker. I expect similar bugs to happen for any other checker using `CDM::CXXMethod`.

If we don't do about this, we are going to have a serious regression in the clang-19 release.
</details>

@steakhal
Copy link
Contributor Author

/CC @NagyDonat @gamesh411

@NagyDonat
Copy link
Contributor

Thanks for alerting me, I'll investigate this. I'd guess that I'll be able to create a fix within a few days.

@NagyDonat
Copy link
Contributor

NagyDonat commented Jul 22, 2024

As I started to examine the source code, I realized that this incorrect behavior is probably unrelated to the new matching modes, and I verified (on godbolt) that it was already present in various older clang versions. (EDIT: I misread the UI, the incorrect behavior does not appear in older clang)

Note that the "check that each qualified name part is present" logic was already present before my improvements (in commit fb299ca I moved it to a separate method, but I'm not changing it) and it's not influenced by the matching mode check (which is just an early return branch before the name validation). (Moreover BlockInCriticalSectionChecker was already describing std::mutex::lock with a qualified name before my PR #90974.)

It would be still good to improve this aspect of the analyzer, but instead of simply reverting to a well-established baseline, we'd need to develop a new, better logic for qualified name handling. (Note that the unit tests of the qualified name handling contain many FIXME comments, it would be good to also handle those if we need to touch this qualified name handling.)

@steakhal
Copy link
Contributor Author

Hm, I get what you say. But it popped up now, that I wanted to rebase to llvm/main. So for sure, something caused that since the last release. You can see this even on Compiler explorer, that we have a report change there.
I remember there were commits by @gamesh411 to the BlockInCriticalSectionChecker, maybe he can explain this report difference.

@NagyDonat
Copy link
Contributor

You can see this even on Compiler explorer, that we have a report change

Yes, now I see it -- previously I misread the UI and "checked" that changing clang 18.1 to older versions doesn't change anything.

I'll continue the investigations.

@NagyDonat
Copy link
Contributor

After digging into the git logs, it's clear that the regression was introduced by commit 705788c which was created on the 18th of March by @gamesh411.

Before that commit the checker used the extremely trivial call description const CallDescription LockFn{{"lock"}}, which would've matched any function or method named "lock" (including the intended std::mutex::lock). It would be very trivial to create a commit which restores that behavior -- but I'm not sure that it would improve the situation.

@NagyDonat
Copy link
Contributor

I'll create a quickfix patch that will fix this particular regression. Eventually we are planning to tweak the CallDescription interface for a more general solution, but right now it's low priority.

NagyDonat added a commit to Ericsson/llvm-project that referenced this issue Aug 1, 2024
Before commit 705788c the checker alpha.unix.BlockInCriticalSection
"recognized" the methods `std::mutex::lock` and `std::mutex::unlock`
with an extremely trivial check that accepted any function (or method)
named lock/unlock.

To avoid matching unrelated user-defined function, this was refined to a
check that also requires the presence of "std" and "mutex" as distinct
parts of the qualified name.

However, as llvm#99628 reported, there are standard library implementations
where some methods of `std::mutex` are inherited from an implementation
detail base class and the new code wasn't able to recognize these
methods, which led to emitting false positive reports.

As a workaround, this commit partially restores the old behavior by
omitting the check for the class name.

In the future, it would be good to replace this hack with a solution
which ensures that `CallDescription` understands inherited methods.
@steakhal
Copy link
Contributor Author

steakhal commented Aug 1, 2024

I'll create a quickfix patch that will fix this particular regression. Eventually we are planning to tweak the CallDescription interface for a more general solution, but right now it's low priority.

Totally agree with this. Thanks for taking care. TBH I completely forgot about this one.

steakhal pushed a commit that referenced this issue Aug 2, 2024
Before commit 705788c the checker alpha.unix.BlockInCriticalSection
"recognized" the methods `std::mutex::lock` and `std::mutex::unlock`
with an extremely trivial check that accepted any function (or method)
named lock/unlock.

To avoid matching unrelated user-defined function, this was refined to a
check that also requires the presence of "std" and "mutex" as distinct
parts of the qualified name.

However, as #99628 reported, there are standard library implementations
where some methods of `std::mutex` are inherited from an implementation
detail base class and the new code wasn't able to recognize these
methods, which led to emitting false positive reports.

As a workaround, this commit partially restores the old behavior by
omitting the check for the class name.

In the future, it would be good to replace this hack with a solution
which ensures that `CallDescription` understands inherited methods.
@steakhal
Copy link
Contributor Author

steakhal commented Aug 2, 2024

Mitigated by #99628

@steakhal steakhal closed this as completed Aug 2, 2024
tru pushed a commit that referenced this issue Aug 2, 2024
Before commit 705788c the checker alpha.unix.BlockInCriticalSection
"recognized" the methods `std::mutex::lock` and `std::mutex::unlock`
with an extremely trivial check that accepted any function (or method)
named lock/unlock.

To avoid matching unrelated user-defined function, this was refined to a
check that also requires the presence of "std" and "mutex" as distinct
parts of the qualified name.

However, as #99628 reported, there are standard library implementations
where some methods of `std::mutex` are inherited from an implementation
detail base class and the new code wasn't able to recognize these
methods, which led to emitting false positive reports.

As a workaround, this commit partially restores the old behavior by
omitting the check for the class name.

In the future, it would be good to replace this hack with a solution
which ensures that `CallDescription` understands inherited methods.

(cherry picked from commit 99ae2ed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants