-
-
Notifications
You must be signed in to change notification settings - Fork 133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
socket.cc recv()
timeout
#156
Comments
Oh interesting - if you start a server w/ no clients and let it run for 1-5min; does the receive fail as well? If not, I don't think it comes from timing. Also on what OS are you running into this? And no, that Cheers |
Okay actually, just spinning a server is not enough; I added a Any special thing about your set-up? Cheers |
I worked on Ubuntu 22.04 KVM, hmm I gonna test it with sleep(). |
I tested with a strace result is as follows:
|
I'll try to repro this this weekend then, thanks for all the details!
Cheers
…On Fri, Mar 31, 2023 at 3:01 AM extf33 ***@***.***> wrote:
I worked on Ubuntu 22.04 KVM, hmm I gonna test it with sleep().
I tested with a sleep(5*60*1000);, the same error occured.
strace result is as follows:
recvfrom(8, 0x563e8545c340, 4, 0, NULL, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGALRM {si_signo=SIGALRM, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]}) = -1 EINTR (Interrupted system call)
—
Reply to this email directly, view it on GitHub
<#156 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALIORLCEXKHPOKHZP36UXDW62TPRANCNFSM6AAAAAAWMX2NXA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Okay this is weird - I can't seem to repro this on an Ubuntu running in WSL2:
I wonder what's going on 🤔 |
My bad, it was me. |
Hmmmm, it is meant to prevent an infinite testcase running and if it hits
it should send a timeout status to the server. I might need to check that
code then cause it might be a bug in its own 🤔
…On Tue, Apr 4, 2023 at 6:27 PM extf33 ***@***.***> wrote:
My bad, it was me.
I gave --limit 100 and thought it is only for execution time.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALIORK4PLYWMDODWN3RNPTW7TDATANCNFSM6AAAAAAWMX2NXA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Okay sorry for the lag, I just started to look into this. Do you remember which backend were you using? From what I can see in the code, the bxcpu backend stop executing and sends a timeout to the server (as expected), the KVM backend uses a SIGALRM to stop KVM and WHV uses TimerQ_t to cancel the VCPU. Cheers |
I only tested on the KVM backend. |
Do you remember on which side the above strace output was from? On the
fuzzer side or on the master?
Cheers
…On Thu, Apr 20, 2023 at 7:01 PM extf33 ***@***.***> wrote:
I only tested on the KVM backend.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALIORPJOBHRG4EFGGXWQKTXCHS6NANCNFSM6AAAAAAWMX2NXA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
If I remember correctly, on the fuzzer side. |
Okay so that makes sense that the fuzzer side receives a SIGALRM; this
should stop KVM from running the testcase and the code will send a timeout
result to the server; but the server shouldn't stop.
Does this match what you saw? If so, then we're good :-D
Cheers
…On Thu, Apr 20, 2023 at 7:43 PM extf33 ***@***.***> wrote:
If I remember correctly, on the fuzzer side.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALIORPLEOKRZOQS3UC4L3LXCHX4ZANCNFSM6AAAAAAWMX2NXA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
That's right, it matches what I saw :) |
Okay cool awesome! Thanks for sharing all those details with me, I think
this issue is resolved then. Let me know if you disagree, otherwise I'll
close it down.
Cheers
…On Thu, Apr 20, 2023 at 7:52 PM extf33 ***@***.***> wrote:
That's right, it matches what I saw :)
—
Reply to this email directly, view it on GitHub
<#156 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALIORNRWHGXLNFNC72PBN3XCHZABANCNFSM6AAAAAAWMX2NXA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
As in bochscpu, I thought about specifying the KVM/WHv backend (time) limit only for execution (not execution+mutation), but I think it will be fine as it is now. |
That's what it should be, at least that's the intended way. Generation of
the testcase happens before execution of the testcase; and the execution
length of the testcase is something hidden aways by the backends so there
shouldn't be any link 🤔
…On Thu, Apr 20, 2023 at 8:00 PM extf33 ***@***.***> wrote:
As in bochscpu, I thought about specifying the KVM/WHv backend (time)
limit only for execution (not execution+mutation), but I think it will be
fine as it is now.
—
Reply to this email directly, view it on GitHub
<#156 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALIORL6WVRLOJBG54K5YMTXCHZ45ANCNFSM6AAAAAAWMX2NXA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Yes, that's the issue here..
In the above scenario, |
OK something that might happen is that the alarm gets canceled and gets
triggered after the end of the testcase... I will investigate this further,
I might have an idea.
Thank you!
Cheers
…On Thu, Apr 20, 2023 at 8:13 PM extf33 ***@***.***> wrote:
Yes, that's the issue here..
Okay so that makes sense that the fuzzer side receives a SIGALRM; this
should stop KVM from running the testcase and the code will send a timeout
result to the server; but the server shouldn't stop.
In the above scenario,
SIGALRM does not occur while KVM is 'running' the testcase, but while the
server is performing mutation (before KVM executes the testcase).
—
Reply to this email directly, view it on GitHub
<#156 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AALIORJXHYKWRRQFPVOVFADXCH3ORANCNFSM6AAAAAAWMX2NXA>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Actually the weird thing is that even if the SIGALRM happens after, it just
turns a boolean to true:
```c++
void KvmBackend_t::SignalAlarm() {
__atomic_store_n(&Run_->immediate_exit, 1, __ATOMIC_RELAXED);
}
void KvmBackend_t::StaticSignalAlarm(int, siginfo_t *, void *) {
KvmBackend_t *KvmBackend = reinterpret_cast<KvmBackend_t *>(g_Backend);
KvmBackend->SignalAlarm();
}
```
Oh well, back to the drawing board 😅
Cheers
|
Okay, I set-up a VM w/ nested virtualization to try this out, and I played with it some more. At first, I thought I had a repro when simulating a long At that point, I removed the
At this point I think there's definitely a bug related to the limit feature in KVM; maybe it comes from the way I handled SIGARLM or something. Maybe the signal from the testcase n-1 is triggered when the testcase n is executed which shuts it down prior to what it should be. Anyways, now that I have a repro I can investigate :) Cheers |
Boy, this is a bad bug actually. The issue might actually be that I never reset If you still have you environment to repro, I'd love if you verify that the below patch fixes your issue: --- a/src/wtf/kvm_backend.cc
+++ b/src/wtf/kvm_backend.cc
@@ -1454,7 +1454,7 @@ std::optional<TestcaseResult_t> KvmBackend_t::Run(const uint8_t *Buffer,
Stop_ = false;
TestcaseRes_ = Ok_t();
Coverage_.clear();
-
+ Run_->immediate_exit = 0;
while (!Stop_) { It fixes it on my test bench:
Cheers |
FYI I will land #161 & close this issue next week if I don't hear anything back :) Cheers |
I implemented a custom mutator, and it takes about 1~5 mins for one mutate cycle.
When I run wtf with the custom mutator, I get the following error:
wtf/src/wtf/socket.cc
Lines 241 to 245 in 3ccad88
Is there any part of the code that sets the timeout or flag such as
O_NONBLOCK
?Actually, I already checked
O_NONBLOCK
.The text was updated successfully, but these errors were encountered: