-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
break when there's nothing to read #2582
Conversation
see huge perf drop when use cpu to inference if bind cpu. |
Signed-off-by: Wang, Yi A <yi.a.wang@intel.com>
This would break the debugger interactivity. What I understand you're doing here is simply cut the reading from stdin (whenever there's nothing to read which is most of the time). What I imagine is the issue is that on CPU, there's a lot of context interrupts because it tries to read from stdin. We could definitely protect is with : |
Here is the approach I'm talking about: #2596 |
Hi, @Narsil PR 2596 still has issue by my side, since it delete the break in func log_lines which is reading stdout (https://github.com/huggingface/text-generation-inference/pull/2582/files#diff-deb98d21e32cf9a211b60ba46044f0ce4b6f75a1ec3adebc69185b7a845ecb18R1068-R1069). by my side. add break in log_lines does not impact the log output. If there's data in stdout, the loop to read output is still triggered, and I think it should be similar for the stdin. could you share me how to use the stdin, so I could verify by my side. |
@Narsil glad to know that ci for intel cpu is added, yes, at least we could detect stuff like broken imports and broken api. cc @yao-matrix |
Okay I've setup a machine on AWS On
I don't think the other arguments you're using are necessary, are they ? I'm mostly trying to reproduce so we can make sure we can keep the benefit of this stdin manipulation in debug while removing the bug for CPU. |
cpuset is needed to reproduce it, our validation teams intended to launcher two text generation servers, one per sockets by cpu binding, and then find this issue |
Can we recheck on master ? I merged the protection with log_level which should limit the issue. I am definitely not able to reproduce on my end unfortunately. |
I still see the issue in master, and I must apply break in log_lines<R: Sized + Read>(mut bufread: BufReader) to avoid it. I just update this PR to fix it. I have two cpu sockets in the node, each socket has 56 physical cores. and just bind to socket0 to run the inference. |
I'm really struggling to reproduce anything. I reproduce your command line with every argument (even though I don't understand why --privileged --net host --ipc host are actually required) and I still get 100ms/token without any issue. The "fix" cannot be a proper fix because if it works that means it's deactivating the log facilities which is not something we desire to ever happen (having good logs is important) Could it be a kernel flag, or specific shell you're using ? I tried to see if it's know that some targets are known to be nonblocking but it seems to be not the case. If it's indeed nonblocking I/O that's causing the issue (which is the only thing that makes sense to me as to why the current fix would fix anything) then I'd favor fixes that actually force blocking I/O. Removing the logs is not ok. |
are you using ping cpu cores like " --cpuset-cpus=0-55"? or could you adjust the cpu cores. maybe could I try some sleep instead of break: but even I break. I still see the log output in the terminal. |
Indeed I adapted since I don't have the same number of cores. I tried to numactl the docker command too to pin a specific socket.
This doesn't make sense. If you break, you should exit the loop therefore exiting our printing loop. |
As a side note, when I'm using ALL cores over both sockets, I get ~110ms/token. Not sure if the value is normal but I figure the socket crossing has overhead compensating the speedup maybe ? Could be a docker version issue on my end too ? (27.1.1 with buildx.) Or something in the cgroups that defeats |
Hi, @Narsil I debug in my environment. find it's blocked in this thread https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs#L1266-L1268, I print "n" in log_lines and it's continuously "0". while the thread https://github.com/huggingface/text-generation-inference/blob/main/launcher/src/main.rs#L946-L948 behaves correctly.
|
💡💡💡💡💡 We're just never closing that thread, which we indeed should (regardless of the output actually). |
Does this fix it : #2674 too ? |
hang in the t.join(). since there's no break in the loop in log_line |
Oh right, let's go with your fix then. I'm still not sure this is technically correct given the docs: https://doc.rust-lang.org/std/io/trait.Read.html#tymethod.read I'm really curious on how to reproduce that |
docker --version |
I am also curious about which the thread is kept looping while the process is close. |
I think it's OS dependant more than CPU dependant no ? I'm wondering how docker handles this and if docker version could affect this too. The doc definitely says that behavior varies based on target. |
I think we use the same OS, since it's ubuntu22.04, see https://github.com/huggingface/text-generation-inference/blob/main/Dockerfile_intel#L127 |
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.