Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miscellaneous improvements #7

Merged
merged 9 commits into from
Apr 3, 2017
Merged

Conversation

bluetech
Copy link
Contributor

Some changes I made while playing with this library. Please see the commit messages.

@bluetech
Copy link
Contributor Author

I got around to running the autobahn test suite. This PR introduces regressions. I will fix them when I have time and report back.

@bluetech bluetech changed the title Miscellaneous improvements Miscellaneous improvements (WIP) Mar 25, 2017
rand::Rand has
    impl<T> Rand for [T; 16] where T: Rand
so we don't need to simulate it ourselves.
This function is the most speed-critical in the library. In profiles,
this optimization reduces it from ~75% of the profile to ~55%.

I have tried several approaches, but didn't manage to improve on this
one (LLVM already unrolls the loop here). Though I'm sure it is possible.
This is using some unsafe code and assumptions about alignment to speed
up apply_mask() more.

In profiles, this optimization reduces it from ~55% of the total runtime
to ~7%.
Suggested by clippy:

warning: use of `ok_or` followed by a function call
  --> src/handshake/server.rs:20:19
   |
20 |           let key = self.headers.find_first("Sec-WebSocket-Key")
   |  ___________________^ starting here...
21 | |             .ok_or(Error::Protocol("Missing Sec-WebSocket-Key".into()))?;
   | |_______________________________________________________________________^ ...ending here
   |
   = note: #[warn(or_fun_call)] on by default
help: try this
   |         let key = self.headers.find_first("Sec-WebSocket-Key").ok_or_else(|| Error::Protocol("Missing Sec-WebSocket-Key".into()))?;
   = help: for further information visit https://github.com/Manishearth/rust-clippy/wiki#or_fun_call
It is only used by the examples; libraries should only depend on the
`log` crate.
@bluetech bluetech force-pushed the misc-improvements branch from 6bac808 to be834ac Compare March 25, 2017 21:49
@bluetech bluetech changed the title Miscellaneous improvements (WIP) Miscellaneous improvements Mar 25, 2017
@bluetech
Copy link
Contributor Author

OK, the autobahn test suite is green again. I removed a couple of commits. I can see why you went to the trouble with the StringCollector now - autobahn really wants us to fail fast on invalid UTF-8.

Once rust-lang/rust#40212 stabilizes, it should be possible to drop the utf-8 crate and use str::from_utf8() again.

@agalakhov
Copy link
Member

Thank you! Looks good. I'll test and merge it ASAP. You're mostly improving the code I copied from ws-rs to speed-up the development, and it really needed a cleanup.

We have the same "early fail on invalid UTF-8" issue with frames too. That's why two tests are still non-strict. There is a possible way to fix it by letting the frame parser return incomplete frames.

The current implementation uses the `remaining_mut()` function from the
bytes::BufMut implementation for Vec<u8>.

In terms of the BufMut trait, a Vec buffer has infinite capacity - you
can always write more to the buffer, since the Vec grows as needed.
Hence, the `remaining_mut` here actually returns +∞ (actually,
`usize::MAX - len`).

So the first `if` is always true, and the calls to `reserve` never
actually allocate the appropriate space. What happens instead is that
the `bytes_mut()` call in `read_from` picks up the slack, but it merely
grows the buffer a 64 bytes at a time, which slows things down.

This changes the check to use the Vec capacity instead of
`remaining_mut`. In my profile (sending 100,000 10KiB messages back and
forth), this reduces `__memmove_avx_unaligned_erms`'s share of the
total runtime from ~77% to ~53%.
@bluetech
Copy link
Contributor Author

I added another commit fixing a problem with InputBuffer (tell me if this annoys you, and I will open a new PR instead).

@ffablia
Copy link

ffablia commented Mar 29, 2017

This PR seems to resolve an infinite memory consumption with high throughput.

I am using throughput benchmark (./throughput 10000 20 10 3000) from uWebSockets against twists echo server and a mio/tungstenite echo server.
Without this PR, both version are using more and more memory.
With this PR, both are using constant memory size.

@agalakhov agalakhov merged commit 65248f1 into snapview:master Apr 3, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants