-
Notifications
You must be signed in to change notification settings - Fork 232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscellaneous improvements #7
Conversation
I got around to running the autobahn test suite. This PR introduces regressions. I will fix them when I have time and report back. |
rand::Rand has impl<T> Rand for [T; 16] where T: Rand so we don't need to simulate it ourselves.
This function is the most speed-critical in the library. In profiles, this optimization reduces it from ~75% of the profile to ~55%. I have tried several approaches, but didn't manage to improve on this one (LLVM already unrolls the loop here). Though I'm sure it is possible.
This is using some unsafe code and assumptions about alignment to speed up apply_mask() more. In profiles, this optimization reduces it from ~55% of the total runtime to ~7%.
Suggested by clippy: warning: use of `ok_or` followed by a function call --> src/handshake/server.rs:20:19 | 20 | let key = self.headers.find_first("Sec-WebSocket-Key") | ___________________^ starting here... 21 | | .ok_or(Error::Protocol("Missing Sec-WebSocket-Key".into()))?; | |_______________________________________________________________________^ ...ending here | = note: #[warn(or_fun_call)] on by default help: try this | let key = self.headers.find_first("Sec-WebSocket-Key").ok_or_else(|| Error::Protocol("Missing Sec-WebSocket-Key".into()))?; = help: for further information visit https://github.com/Manishearth/rust-clippy/wiki#or_fun_call
It is only used by the examples; libraries should only depend on the `log` crate.
6bac808
to
be834ac
Compare
OK, the autobahn test suite is green again. I removed a couple of commits. I can see why you went to the trouble with the Once rust-lang/rust#40212 stabilizes, it should be possible to drop the |
Thank you! Looks good. I'll test and merge it ASAP. You're mostly improving the code I copied from ws-rs to speed-up the development, and it really needed a cleanup. We have the same "early fail on invalid UTF-8" issue with frames too. That's why two tests are still non-strict. There is a possible way to fix it by letting the frame parser return incomplete frames. |
The current implementation uses the `remaining_mut()` function from the bytes::BufMut implementation for Vec<u8>. In terms of the BufMut trait, a Vec buffer has infinite capacity - you can always write more to the buffer, since the Vec grows as needed. Hence, the `remaining_mut` here actually returns +∞ (actually, `usize::MAX - len`). So the first `if` is always true, and the calls to `reserve` never actually allocate the appropriate space. What happens instead is that the `bytes_mut()` call in `read_from` picks up the slack, but it merely grows the buffer a 64 bytes at a time, which slows things down. This changes the check to use the Vec capacity instead of `remaining_mut`. In my profile (sending 100,000 10KiB messages back and forth), this reduces `__memmove_avx_unaligned_erms`'s share of the total runtime from ~77% to ~53%.
I added another commit fixing a problem with |
This PR seems to resolve an infinite memory consumption with high throughput. I am using throughput benchmark (./throughput 10000 20 10 3000) from uWebSockets against twists echo server and a mio/tungstenite echo server. |
Some changes I made while playing with this library. Please see the commit messages.