-
Notifications
You must be signed in to change notification settings - Fork 297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BytesMut::freeze
is not always zero-cost as documented
#587
Comments
I don't think the realloc is necessary in the |
The details aren't fresh in my mind, but I think the reason it does into_boxed_slice is because there was no room to store the length and capacity. |
I think this can be pretty easily implemented by turning the |
Making it shared does require a new allocation too. Would it be worth checking if the length and capacity are the same, and so into_boxed_slice won't need to shrink? |
That's true, but it requires a much smaller allocation of a handful of bytes, which I think will be acceptable here. Reallocating my buffer of 64 KiB requires substantially more space and a much larger I can certainly see if the length and capacity are the same and use |
When we freeze a BytesMut, we turn it into a Vec, and then convert that to a Bytes. Currently, this happen using Vec::into_boxed_slice, which reallocates to a slice of the same length as the Vev if the length and the capacity are not equal. This can pose a performance problem if the Vec is large or if this happens many times in a loop. Instead, let's compare the length and capacity, and if they're the same, continue to handle this using into_boxed_slice. Otherwise, since we have a type of vtable which can handle a separate capacity, the shared vtable, let's turn our Vec into that kind of Bytes. While this does not avoid allocation altogether, it performs a fixed size allocation and avoids any need to memcpy.
When we freeze a BytesMut, we turn it into a Vec, and then convert that to a Bytes. Currently, this happen using Vec::into_boxed_slice, which reallocates to a slice of the same length as the Vev if the length and the capacity are not equal. This can pose a performance problem if the Vec is large or if this happens many times in a loop. Instead, let's compare the length and capacity, and if they're the same, continue to handle this using into_boxed_slice. Otherwise, since we have a type of vtable which can handle a separate capacity, the shared vtable, let's turn our Vec into that kind of Bytes. While this does not avoid allocation altogether, it performs a fixed size allocation and avoids any need to memcpy.
When we freeze a BytesMut, we turn it into a Vec, and then convert that to a Bytes. Currently, this happen using Vec::into_boxed_slice, which reallocates to a slice of the same length as the Vev if the length and the capacity are not equal. This can pose a performance problem if the Vec is large or if this happens many times in a loop. Instead, let's compare the length and capacity, and if they're the same, continue to handle this using into_boxed_slice. Otherwise, since we have a type of vtable which can handle a separate capacity, the shared vtable, let's turn our Vec into that kind of Bytes. While this does not avoid allocation altogether, it performs a fixed size allocation and avoids any need to memcpy.
When we freeze a BytesMut, we turn it into a Vec, and then convert that to a Bytes. Currently, this happen using Vec::into_boxed_slice, which reallocates to a slice of the same length as the Vev if the length and the capacity are not equal. This can pose a performance problem if the Vec is large or if this happens many times in a loop. Instead, let's compare the length and capacity, and if they're the same, continue to handle this using into_boxed_slice. Otherwise, since we have a type of vtable which can handle a separate capacity, the shared vtable, let's turn our Vec into that kind of Bytes. While this does not avoid allocation altogether, it performs a fixed size allocation and avoids any need to memcpy.
When we freeze a BytesMut, we turn it into a Vec, and then convert that to a Bytes. Currently, this happen using Vec::into_boxed_slice, which reallocates to a slice of the same length as the Vev if the length and the capacity are not equal. This can pose a performance problem if the Vec is large or if this happens many times in a loop. Instead, let's compare the length and capacity, and if they're the same, continue to handle this using into_boxed_slice. Otherwise, since we have a type of vtable which can handle a separate capacity, the shared vtable, let's turn our Vec into that kind of Bytes. While this does not avoid allocation altogether, it performs a fixed size allocation and avoids any need to memcpy.
When we freeze a BytesMut, we turn it into a Vec, and then convert that to a Bytes. Currently, this happen using Vec::into_boxed_slice, which reallocates to a slice of the same length as the Vev if the length and the capacity are not equal. This can pose a performance problem if the Vec is large or if this happens many times in a loop. Instead, let's compare the length and capacity, and if they're the same, continue to handle this using into_boxed_slice. Otherwise, since we have a type of vtable which can handle a separate capacity, the shared vtable, let's turn our Vec into that kind of Bytes. While this does not avoid allocation altogether, it performs a fixed size allocation and avoids any need to memcpy.
When converting a
BytesMut
intoBytes
viafreeze
, the operation can cause a re-allocation to occur because we callinto_boxed_slice
. This is unacceptable in some hot codepaths.The documentation says that “[t]he conversion is zero cost.” I don't believe that an allocation can be considered zero cost.
Example code:
If you compile this program and set a breakpoint on
realloc
, you'll see this (after the startup code):I believe in this case that's because the variant of the
BytesMut
isKIND_VEC
and we're then recreating theVec
and doing aninto
, which callsinto_boxed_slice
. This is also the case if one substitutesBytesMut
in the code forVec
, but in that case, there's no guarantee of that being zero-cost.I'm not sure what the best way forward here is, but it would be great if someone could propose an idea where this really is zero cost. I have a server where I need to read some data into a buffer and then transfer it using a Bytes via Hyper, and since this is a very hot codepath, I really want to avoid a reallocation, since the reallocation shows up in my performance graphs.
The text was updated successfully, but these errors were encountered: