Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features to support musli-zerocopy #523

Open
joshlf opened this issue Oct 19, 2023 · 7 comments
Open

Features to support musli-zerocopy #523

joshlf opened this issue Oct 19, 2023 · 7 comments
Labels
customer-request Documents customer requests.

Comments

@joshlf
Copy link
Member

joshlf commented Oct 19, 2023

See https://udoprog.github.io/rust/2023-10-19/musli-zerocopy.html and https://docs.rs/musli-zerocopy/latest/musli_zerocopy/.

cc @udoprog

@joshlf
Copy link
Member Author

joshlf commented Oct 19, 2023

@udoprog re: bytes to type conversion: You can usually accomplish this by constructing a Ref and then using into_ref, into_mut, into_slice, or into_mut_slice. The ergonomics are definitely bad, and we need to improve them.

@joshlf joshlf added the customer-request Documents customer requests. label Oct 19, 2023
@udoprog
Copy link

udoprog commented Oct 19, 2023

Hi!

I'll go through some of the major areas I've yet to find a solution for:

  1. Types that can be coerced through validation. There is no trait to coerce NonZero* or char directly and FromBytes extends FromZeros effectively preventing it and all structures which depend on it. Ref does, so it's also excluded.

  2. General APIs for working with padding in any buffer. If I had my pick I think you should adopt something like my API wholesale. These are the relevant parts:

pub struct Padder<'a, T: ?Sized> {
    ptr: *mut u8,
    offset: usize,
    _marker: PhantomData<&'a mut T>,
}

impl<'a, T: ?Sized> Padder<'a, T> {
    pub(crate) fn new(ptr: *mut u8) -> Self {
        Self {
            ptr,
            offset: 0,
            _marker: PhantomData,
        }
    }

    /* methods that derived structures uses to indicate which fields and discriminators are present */
 }

trait UnsizedPaddable {
    unsafe fn pad(&self, pad: &mut Padder<'_, Self>);
}

trait Paddable {
    unsafe fn pad(&self, pad: &mut Padder<'_, Self>);
}

That way, anything that can provide a mutable pointer can have padding bytes written to it, like a Vec<u8>. It also follows that you ought to expose somehow whether a type T requires padding (I use T::PADDED and it optimizes really well). You might opt to make it generic through some trait, all though I personally wouldn't need that and can't think of a case where it is needed. There is quite a bit of subtlety (which is echoed in the actual Padded api). E.g. to determine how the instance of an enum is padded you have to load the discriminant, and since the enum might be part of a packed type the load has to be unaligned.

To use such an API, the caller is resonsible for ensuring that a buffer with capacity size_of::<T>() is available somewhere where the bytes of T has been copied to so that the padding can be filled in by the abstraction. This also makes it unsafe.

  1. APIs for working with unsized types. Unfortunately everything needed to support it well is unstable (ptr_metadata, align_of_val_raw), so this is super messy and has to be specialized over the types that have metadata constructors (str and [T]). I'm not sure whether that's something a generic library wants to incorporate right now, but you can get a feel for how I did it with my own Pointee trait mirroring that in core.

Again, please advise if there's something about zerocopy I've missed. And if there's question or push back feel free to push. I'll be happy to detail more things I can think of in the future.

@joshlf
Copy link
Member Author

joshlf commented Oct 19, 2023

Hi!

I'll go through some of the major areas I've yet to find a solution for:

  1. Types that can be coerced through validation. There is no trait to coerce NonZero* or char directly and FromBytes extends FromZeros effectively preventing it and all structures which depend on it. Ref does, so it's also excluded.

Does #5 address this use case?

  1. General APIs for working with padding in any buffer. If I had my pick I think you should adopt something like my API wholesale. These are the relevant parts:
pub struct Padder<'a, T: ?Sized> {
    ptr: *mut u8,
    offset: usize,
    _marker: PhantomData<&'a mut T>,
}

impl<'a, T: ?Sized> Padder<'a, T> {
    pub(crate) fn new(ptr: *mut u8) -> Self {
        Self {
            ptr,
            offset: 0,
            _marker: PhantomData,
        }
    }

    /* methods that derived structures uses to indicate which fields and discriminators are present */
 }

trait UnsizedPaddable {
    unsafe fn pad(&self, pad: &mut Padder<'_, Self>);
}

trait Paddable {
    unsafe fn pad(&self, pad: &mut Padder<'_, Self>);
}

That way, anything that can provide a mutable pointer can have padding bytes written to it, like a Vec<u8>. It also follows that you ought to expose somehow whether a type T requires padding (I use T::PADDED and it optimizes really well). You might opt to make it generic through some trait, all though I personally wouldn't need that and can't think of a case where it is needed. There is quite a bit of subtlety (which is echoed in the actual Padded api). E.g. to determine how the instance of an enum is padded you have to load the discriminant, and since the enum might be part of a packed type the load has to be unaligned.

To use such an API, the caller is resonsible for ensuring that a buffer with capacity size_of::<T>() is available somewhere where the bytes of T has been copied to so that the padding can be filled in by the abstraction. This also makes it unsafe.

I think I roughly understand what's going on here, but can you say more about how this API is used? We have #494 which it sounds like describes something similar, but I'm not sure if it'd fully cover your use case.

  1. APIs for working with unsized types. Unfortunately everything needed to support it well is unstable (ptr_metadata, align_of_val_raw), so this is super messy and has to be specialized over the types that have metadata constructors (str and [T]). I'm not sure whether that's something a generic library wants to incorporate right now, but you can get a feel for how I did it with my own Pointee trait mirroring that in core.

Yeah, we're feeling this pain too. I've submitted rust-lang/reference#1417, which should provide enough of a guarantee that you can build unsized type support on top of it (basically, you use slice_from_raw_parts to construct a *const [()] of the appropriate length and then cast it to *const T where T is a slice DST). We're planning on supporting slice DST conversions once that PR lands.

Again, please advise if there's something about zerocopy I've missed. And if there's question or push back feel free to push. I'll be happy to detail more things I can think of in the future.

@udoprog
Copy link

udoprog commented Oct 19, 2023

I think I roughly understand what's going on here, but can you say more about how this API is used? We have #494 which it sounds like describes something similar, but I'm not sure if it'd fully cover your use case.

That is the same as my ZeroCopy::to_bytes. You're on the right track, but it's a bit too high level to incorporate well into buffers.

You can actually see how the proposed method in #494 could be implemented using Padder / pad / T::PADDED here: https://docs.rs/musli-zerocopy/latest/src/musli_zerocopy/traits.rs.html#447. And I believe #494 is also why I got the impression that padding APIs was on your roadmap.

@udoprog
Copy link

udoprog commented Oct 19, 2023

"Well" as-in, the ability to pad something solely based on a &T. The proposed API in #494 would require you to make a local mutable aligned copy out of it somewhere. Byte buffers are not always aligned, so copying it into a buffer and then coercing into &mut T to pipe it through that API is probably not an option.

@joshlf
Copy link
Member Author

joshlf commented Oct 19, 2023

Okay gotcha, thanks for clarifying! IIUC, the point is that you need to be able to take a &T where the T might contain padding and copy its bytes into a target buffer while initializing any padding bytes along the way, but of course you can't do that in the &T itself because it's not mutable. Is that right?

I've updated #494 to mention this use case.

@joshlf
Copy link
Member Author

joshlf commented Oct 28, 2023

@udoprog Just wanted to let you know that we added convenience constructors for bytes-to-type conversions like ref_from in #526; these are available in the latest published version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
customer-request Documents customer requests.
Projects
None yet
Development

No branches or pull requests

2 participants