Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Envoy is spinning CPU on Windows #13189

Closed
davinci26 opened this issue Sep 18, 2020 · 9 comments · Fixed by #13787
Closed

Envoy is spinning CPU on Windows #13189

davinci26 opened this issue Sep 18, 2020 · 9 comments · Fixed by #13787
Assignees
Labels
area/windows design proposal Needs design doc/proposal before implementation

Comments

@davinci26
Copy link
Member

davinci26 commented Sep 18, 2020

Title: Envoy is spinning CPU on Windows

Description:

In 2019 version of Windows Server there is no support for level-based events which causes Envoy to spin CPU to process "empty" write ready events.

To resolve this issue we consider the following three approaches:

  1. Selectively turn on and off the level based events. The idea is to only listen to file event notifications when we want to write to the socket (either because the previous write call got blocked or new data came in).
  2. Change the socket implementation to use the I/O completion ports paradigm.
  3. Rely on a different backend (see libevent replacement with libev #4952)

cc: @envoyproxy/windows-dev

@davinci26 davinci26 added triage Issue requires triage design proposal Needs design doc/proposal before implementation area/windows labels Sep 18, 2020
@davinci26
Copy link
Member Author

Given that the fact that newer versions of Windows will support edge-based events and that it requires the least amount of changes, I want to start experimenting with approach (1).

I got some input from @antoniovicente on the approach and I will start working on it for the beta release.

@antoniovicente
Copy link
Contributor

It would be good to see if approach (1) works out. It certainly seems the least disruptive. The level trigger mode of operation could also be an option under linux, so testing sockets that operate in this paradigm should be possible on multiple platforms. I hope that the changes can be relatively localized: either Network::Connection and/or IoHandle.

Keep me posted on progress.

@davinci26 davinci26 self-assigned this Sep 19, 2020
@yanavlasov yanavlasov removed the triage Issue requires triage label Sep 22, 2020
@yanavlasov
Copy link
Contributor

Don't we already subscribe to write ready events when we have data to write?

@davinci26
Copy link
Member Author

Don't we already subscribe to write ready events when we have data to write?
I missed a only in my initial post (I will edit for clarity).

The high level concept is to optimize the time that we are subscribed to write-ready events. The idea is to only listen for write ready events on specific conditions Like when the previous write call to the socket got blocked or when we have data to the buffer.

@mattklein123
Copy link
Member

Don't you have the same problem with read events when we don't consume the entire read buffer?

@antoniovicente
Copy link
Contributor

Don't you have the same problem with read events when we don't consume the entire read buffer?

In the current implementation, the call to readDisable(true) when above high watermark (or after we have a full H1 request message) removes read from the fd registration mask, so we already avoid having spurious read upcalls.

Of course there's this change I want to make once we have benchmarks that measure impact. The removal of spurious changes to the fd registraion mask should provide performance gains. master...antoniovicente:optimize_read_disable

@mattklein123
Copy link
Member

Of course there's this change I want to make once we have benchmarks that measure impact. The removal of spurious changes to the fd registraion mask should provide performance gains. master...antoniovicente:optimize_read_disable

Ah OK, cool. I had assumed we already did that. :)

@piscisaureus
Copy link

piscisaureus commented Nov 1, 2020

In lieu of windows/wepoll support for EPOLLET, I'd suggest to use EPOLLONESHOT instead.

EPOLLONESHOT is similar to EPOLLET in the sense that when an event (like EPOLLIN) is reported, it is also automatically disabled.

The difference is that in edge triggered mode the event is also automatically turned back on after calling read()/write(), whereas in ONESHOT mode this needs to be done "manually" by envoy.

So every time you're "done" reading from/writing to a socket (typically that is when you get a EWOULDBLOCK error, although there can be other reasons to "move on" from interacting with a particular socket), you have to re-enable the EPOLLIN/EPOLLOUT event with epoll_ctl().

It might require a bit of work but it should be straightforward, almost mechanical.

W.r.t wepoll performance - this change would not introduce additional overhead. In default (level-triggered) mode wepoll is doing exactly the same thing internally.

I'd suggest to to avoid hacks that switch level-triggered events on and off all the time. It's a lot more complicated, and it can make wepoll a bit slower (that depends on the exact epoll_ctl()/epoll_wait() call pattern).

@davinci26, @antoniovicente

@davinci26
Copy link
Member Author

@piscisaureus thanks a lot for your input! Highly appreciated.

EPOLLONESHOT is similar to EPOLLET in the sense that when an event (like EPOLLIN) is reported, it is also automatically disabled. The difference is that in edge triggered mode the event is also automatically turned back on after calling read()/write(), whereas in ONESHOT mode this needs to be done "manually" by envoy.

Conceptually this is pretty similar to what we are trying to do. This is good validation for what we are trying to achieve.

you have to re-enable the EPOLLIN/EPOLLOUT event with epoll_ctl()

Does libevent give you access to modify the flags of the backend? I have to investigate that further. @nigriMSFT do you happen to know?

In default (level-triggered) mode wepoll is doing exactly the same thing internally.

I am not sure I fully understand what you mean here.

I'd suggest to to avoid hacks that switch level-triggered events on and off all the time. It's a lot more complicated, and it can make wepoll a bit slower (that depends on the exact epoll_ctl()/epoll_wait() call pattern).

I have implemented this as part of #13787. But I do not see much extra complication. Am I missing something?

@wrowe wrowe removed this from the windows-ga milestone Jan 15, 2021
@davinci26 davinci26 mentioned this issue Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/windows design proposal Needs design doc/proposal before implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants