Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a Bytes array based send interface to quinn-proto #1013

Merged
merged 2 commits into from
Mar 1, 2021

Conversation

Matthias247
Copy link
Contributor

Add a Bytes array based send interface to quinn-proto

This change modifies quinn-proto to allow callers to submit a list
of one or more owned Bytes chunks for transmission instead of
pure byte slices. This will provide a more efficient zero-copy
interface for applications which already make use of owned
Bytes buffers.

The internals of quinn-proto have been refactored in order to keep
the ability to pass &[u8] buffers and defer the conversion into
Bytes as long as possible, in order to avoid unnecessary allocations
if no data can be stored due to flow control.

@Matthias247 Matthias247 changed the title Bytes write Add a Bytes array based send interface to quinn-proto Feb 10, 2021
@Matthias247 Matthias247 changed the title Add a Bytes array based send interface to quinn-proto Add a Bytes array based send interface to quinn-proto Feb 10, 2021
Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the direction here! A bunch of questions about the details...


use bytes::Bytes;

/// A source of one or more buffers which can be converted into `Bytes` buffers on demand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, this trait definition should go at the bottom of the module. Also, please drop the periods from the first sentence of each comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trait definition is still at the top, and the Written definition should go beneath it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I wrote in the first comment in this thread, the trait definition should go at the bottom of the module.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I think I misunderstood. I put it beyond Written since I thought it would still be more important to most readers than the implementations, which are used only in a tiny set of places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can move it down

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I'm confused now. Either we move all dependencies down (Written + BytesSource), and keep the less important internal implementations on the top. Or we keep it as currently is. But I'm not sure if you want

  • Written (publically exported)
  • BytesArray
  • ByteSlice
  • BytesSource

?

Can you specify which order you intent?

@djc
Copy link
Member

djc commented Feb 10, 2021

Also, do you have any benchmarking results for what this API enables?

@Matthias247
Copy link
Contributor Author

Also, do you have any benchmarking results for what this API enables?

Not really, since I haven't started working on the quinn API yet. Based on the flamegraphs I would say maybe 2% if the application already has data in Bytes format - which is true for my use-case.

But there is another benefit that the data copy and buffer allocation no longer happens inside the Connection mutex, which could have an additional impact.

@Matthias247 Matthias247 force-pushed the bytes_write branch 2 times, most recently from c1902ff to e417557 Compare February 10, 2021 21:26
@Matthias247 Matthias247 force-pushed the bytes_write branch 3 times, most recently from 7c8cbdd to 47ce9fc Compare February 11, 2021 20:39
@Matthias247
Copy link
Contributor Author

The new clippy warning seems like a rustc 1.50 artifact

Copy link
Member

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comments.

@Matthias247 Matthias247 force-pushed the bytes_write branch 2 times, most recently from c1265a1 to e420412 Compare February 16, 2021 01:42
@djc
Copy link
Member

djc commented Feb 16, 2021

@Ralith any thoughts?

Copy link
Collaborator

@Ralith Ralith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a good direction. However, I'm concerned about how complex the new trait is. It's not part of the public interface so it's not the end of the world, but I do think we can simplify it considerably. Concrete suggestions follow.

// Returns the total amount of bytes to write
fn len(&self) -> usize;
/// Limits the source to the new size
fn limit(&mut self, new_len: usize);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than an extra method and associated state, could we add a max_len: usize to pop_chunk? We'd have to pass a limit from Streams to streams::Send, but that seems inoffsensive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be. But then we have to subtract the limit in Stream::write based on what was already written. I feel like then actually more code with less test coverage is tasked with keeping track of limits. This way its fairly concentrated in one place.

Copy link
Collaborator

@Ralith Ralith Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the current approach has to duplicate logic across both impls, whereas moving it out of the trait would yield a single implementation, and a trait that's simpler to reason about besides.

/// Limits the source to the new size
fn limit(&mut self, new_len: usize);
/// The amount of bytes and chunks consumed from this source
fn consumed(&self) -> Written;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to this being tracked by the source? Intuitively it seems like the accumulators could easily be maintained source-independently in the ultimate fn write implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of the same answer as for limits. It more or less requires all intermediate code to track the same information that the source already tracks. And we can't drop the information from there, since it also requires it to determine the offsets of the next data to write.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd think there's even less communication between intermediate layers here than in the limit case. We have exactly one location that ever calls pop_chunk, and it can very naturally return the Written counters back up the stack.

Copy link
Contributor Author

@Matthias247 Matthias247 Feb 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an attempt: https://github.com/quinn-rs/quinn/compare/main...Matthias247:bytes_write_2?expand=1

Ignore most of the ByteSource implementation. It's not updated and broken, just has the updated API.
Most things seem to work fine. However the loop in Send::write is broken: It can't properly populate Written::chunks, since it doesn't know whether pop_chunk returned a full chunk or whether it was trunated due to the limit.

In order to fix this, we we would need to add either a bool to pop_chunk which indicates if this consumed a full buffer (which is however somewhat weird, since the internal buffer structures are transparent to the caller), or return a Option<(Written, Bytes)> from pop_chunk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So returning Option<(Written, Bytes)> wasn't optimal either, because it didn't allow to indicate we consumed chunks in the source if no Bytes elements had been pulled (which can happen in case empty Bytes elements are enqueued). It's obviously a borderline thing of invalid input, but I still think we need to handle it correctly.

One option to fix it is to change the return value to (Option<Bytes>, Written) or (Option<Bytes>, usize), so that consumed chunks can be reported independent of pulling a chunk. I updated for this in the last version of the linked other variant.

I'm split on whether I like it better than this approach here. Yes - the trait is smaller. However returning a consumed chunks number is weird, since the trait nowhere else mentions multiple chunks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new branch actually looks pretty good to me in its current state. The simpler interface makes the semantics a lot more obvious, and only mentioning multiple chunks in one place isn't obscure when there's only one method on the trait to begin with. I agree with the decision re: return values, as well, except that I wonder if Option<Bytes> could be simplified to Bytes by returning an empty Bytes when appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I will then update to it. I had the same thought about Bytes::new(). It might be actually be a bit more efficient. But on the other hand the explicit None is a good signal to stop iteration. I have no strong thoughts on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I now updated for Bytes::new since I think it looks a bit more concise on the consumer side.

@Matthias247 Matthias247 force-pushed the bytes_write branch 4 times, most recently from 3366a97 to a6314c0 Compare February 25, 2021 04:51
Ralith
Ralith previously approved these changes Feb 25, 2021

/// Indicates how many bytes and chunks had been transferred in a write operation
#[derive(Debug, Default, PartialEq, Eq, Clone, Copy)]
pub struct Written {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-refactor, this should probably be defined in connection::streams::send, the sole location where it's constructed.

/// The type allows to dequeue a single [`Bytes`] chunk, which will be lazily
/// created from a reference. This allows to defer the allocation until it is
/// known how much data needs to be copied.
pub struct ByteSlice<'a> {
Copy link
Collaborator

@Ralith Ralith Feb 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cheeky bonus points if you can get rid of the struct and impl on &[u8] directly. Would make the tests a bit more concise!

Matthias Einwag added 2 commits February 28, 2021 17:43
This changes the interface from SendBuffer to directly accept owned
`Bytes` buffers, instead of converting to them.
This change modifies quinn-proto to allow callers to submit a list
of one or more owned `Bytes` chunks for transmission instead of
pure byte slices. This will provide a more efficient zero-copy
interface for applications which already make use of owned
`Bytes` buffers.

The internals of quinn-proto have been refactored in order to keep
the ability to pass `&[u8]` buffers and defer the conversion into
`Bytes` as long as possible, in order to avoid unnecessary allocations
if no data can be stored due to flow control.
@djc djc merged commit 0020e1e into quinn-rs:main Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants