Signal timeout when locking task in getLongRunningOpCalls #8511

Yuhta · 2024-01-24T17:42:31Z

Summary:
If there are some problem (e.g. deadlock) with the mutex in Task,
getLongRunningOpCalls would block forever waiting for it. In this case we
should signal the error and let the caller to generate some form of alerts.
This is important as we rely on getLongRunningOpCalls to detect blocking and
the function itself should not block.

Differential Revision: D53048088

Summary: If there are some problem (e.g. deadlock) with the mutex in `Task`, `getLongRunningOpCalls` would block forever waiting for it. In this case we should signal the error and let the caller to generate some form of alerts. This is important as we rely on `getLongRunningOpCalls` to detect blocking and the function itself should not block. Differential Revision: D53048088

netlify · 2024-01-24T17:42:36Z

✅ Deploy Preview for meta-velox canceled.

Name	Link
🔨 Latest commit	`6396b5f`
🔍 Latest deploy log	https://app.netlify.com/sites/meta-velox/deploys/65b14c09dbfd0f0008da7c6e

facebook-github-bot · 2024-01-24T17:42:39Z

This pull request was exported from Phabricator. Differential Revision: D53048088

xiaoxmeng

@Yuhta LGTM. Thanks!

xiaoxmeng · 2024-01-24T17:51:46Z

velox/exec/Driver.cpp

@@ -194,7 +194,7 @@ void BlockingState::setResume(std::shared_ptr<BlockingState> state) {
        auto& driver = state->driver_;
        auto& task = driver->task();

-        std::lock_guard<std::mutex> l(task->mutex());
+        std::lock_guard<std::timed_mutex> l(task->mutex());


@Yuhta we do see the problem in the production? Thanks!

No it's for preventive measure, first to rule out the possibility that we get deadlock here during the detection (collecting run time of op call)

mbasmanova · 2024-01-24T17:58:16Z

velox/exec/Task.h

+  /// Return false when the lock cannot be taken within the timeout, in that
+  /// case the result is not populated.  Return true if everything works well.
+  bool getLongRunningOpCalls(
+      std::chrono::nanoseconds lockTimeout,


What is the timeout? Can't find where is it specific for some reason.

It's the timeout for locking the per task mutex. I will update the comment to make it more explicit.

But what value do we use for this timeout? Where can I see that?

It will be in D52921877 but that is still work in progress. I am planning to use 10s.

facebook-github-bot · 2024-01-24T22:37:33Z

This pull request has been merged in 147ff30.

conbench-facebook · 2024-01-24T22:58:57Z

Conbench analyzed the 1 benchmark run on commit 147ff303.

There weren't enough matching historic benchmark results to make a call on whether there were regressions.

The full Conbench report has more details.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 24, 2024

facebook-github-bot added the fb-exported label Jan 24, 2024

xiaoxmeng approved these changes Jan 24, 2024

View reviewed changes

xiaoxmeng requested a review from mbasmanova January 24, 2024 17:53

mbasmanova reviewed Jan 24, 2024

View reviewed changes

facebook-github-bot closed this in 147ff30 Jan 24, 2024

facebook-github-bot added the Merged label Jan 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Signal timeout when locking task in getLongRunningOpCalls #8511

Signal timeout when locking task in getLongRunningOpCalls #8511

Yuhta commented Jan 24, 2024

netlify bot commented Jan 24, 2024 •

edited

Loading

facebook-github-bot commented Jan 24, 2024

xiaoxmeng left a comment

xiaoxmeng Jan 24, 2024

Yuhta Jan 24, 2024

mbasmanova Jan 24, 2024

Yuhta Jan 24, 2024

mbasmanova Jan 24, 2024

Yuhta Jan 24, 2024 •

edited

Loading

facebook-github-bot commented Jan 24, 2024

conbench-facebook bot commented Jan 24, 2024

Signal timeout when locking task in getLongRunningOpCalls #8511

Signal timeout when locking task in getLongRunningOpCalls #8511

Conversation

Yuhta commented Jan 24, 2024

netlify bot commented Jan 24, 2024 • edited Loading

✅ Deploy Preview for meta-velox canceled.

facebook-github-bot commented Jan 24, 2024

xiaoxmeng left a comment

Choose a reason for hiding this comment

xiaoxmeng Jan 24, 2024

Choose a reason for hiding this comment

Yuhta Jan 24, 2024

Choose a reason for hiding this comment

mbasmanova Jan 24, 2024

Choose a reason for hiding this comment

Yuhta Jan 24, 2024

Choose a reason for hiding this comment

mbasmanova Jan 24, 2024

Choose a reason for hiding this comment

Yuhta Jan 24, 2024 • edited Loading

Choose a reason for hiding this comment

facebook-github-bot commented Jan 24, 2024

conbench-facebook bot commented Jan 24, 2024

netlify bot commented Jan 24, 2024 •

edited

Loading

Yuhta Jan 24, 2024 •

edited

Loading