-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix envoy crash during early shutdown #4937
Conversation
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
/** | ||
* @return whether the shutdown method has been called. | ||
*/ | ||
virtual bool isShutdown() PURE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if the "don't call virtual functions in destructor" (e.g. https://www.artima.com/cppsource/nevercall.html) rule applies here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the pointer. I think the way I'm using it here is safe though - here is the relevant bit "Once a derived class destructor has run, the object's derived class data members assume undefined values, so C++ treats them as if they no longer exist. Upon entry to the base class destructor, the object becomes a base class object".
In this case, we're calling the virtual method (isShutdown) from the derived class destructor (InstanceImpl) not that of the base class (Instance) so the object should still be treated as an InstanceImpl and call the correct method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@htuch Actually clang-analyzer-optin.cplusplus.VirtualCall
check will warn about this (though, not an error now so it will pass) in clang-tidy CI. It doesn't seems the case?
In this case, we're calling the virtual method (isShutdown) from the derived class destructor (InstanceImpl) not that of the base class (Instance) so the object should still be treated as an InstanceImpl and call the correct method.
@eziskind if you make the virtual method called by the derived class destructor final
, then clang-analyzer won't complain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Thanks for fixing!
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
741c77b
to
d9cb9f4
Compare
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
Signed-off-by: Elisha Ziskind <eziskind@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Another heap-user-after-free, this time we were missing a fix that had been applied to server.h but not config_validation/server.h (envoyproxy#4940). While working on this, attempted to make fully consistent and as simple/clear as possible the constraints on member ordering. This PR is in the tradition (!) of envoyproxy#5847, envoyproxy#4940, envoyproxy#4937. I think long-term we might want to think of more dynamic and explicit declaration of ordering constraints, it's evidently pretty fragile. Also, the lack of single-source-of-truth and duplication across prod and config server code bites again. Fixes oss-fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13228. Risk level: Low Testing: Corpus entry added. Signed-off-by: Harvey Tuch <htuch@google.com>
Another heap-user-after-free, this time we were missing a fix that had been applied to server.h but not config_validation/server.h (#4940). While working on this, attempted to make fully consistent and as simple/clear as possible the constraints on member ordering. This PR is in the tradition (!) of #5847, #4940, #4937. I think long-term we might want to think of more dynamic and explicit declaration of ordering constraints, it's evidently pretty fragile. Also, the lack of single-source-of-truth and duplication across prod and config server code bites again. Fixes oss-fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13228. Risk level: Low Testing: Corpus entry added. Signed-off-by: Harvey Tuch <htuch@google.com>
…#6023) Another heap-user-after-free, this time we were missing a fix that had been applied to server.h but not config_validation/server.h (envoyproxy#4940). While working on this, attempted to make fully consistent and as simple/clear as possible the constraints on member ordering. This PR is in the tradition (!) of envoyproxy#5847, envoyproxy#4940, envoyproxy#4937. I think long-term we might want to think of more dynamic and explicit declaration of ordering constraints, it's evidently pretty fragile. Also, the lack of single-source-of-truth and duplication across prod and config server code bites again. Fixes oss-fuzz issue https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=13228. Risk level: Low Testing: Corpus entry added. Signed-off-by: Harvey Tuch <htuch@google.com> Signed-off-by: Fred Douglas <fredlas@google.com>
Description: This fixes a crash during early shutdown that occurs in the InstanceImpl destructor when it frees the ListenerManagerImpl object. PR #4244 tried to fix this but we're still seeing this periodically. In the crash scenario, the RdsRouteConfigSubscription destructor calls the initialize_callback_ function which ends up calling the InitManager initialize lambda created in the RunHelper constructor. That lambda has a call to workers_start_cb which is protected by a check on the RunHelpers shutdown_ field. However at the point this lambda is executed the RunHelper object has already been destructed (in InstanceImpl::run) so the shutdown_ field is dangling and can contain arbitrary content. In this case we ended up falling through and calling the workers_start_cb function which called InstanceImpl::startWorkers which caused the crash because InstanceImpl::listener_manager_ was null. This fix avoids the crash by making the lambda passed to the InitManager::initialize method not bind to the RunHelper object since it may not exist at the time the lambda is executed. Instead keep the shutdown state in the InstanceImpl object itself.
Risk Level: low
Testing: unit testing