Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

encodeVarint64 performance improvement preallocate ArrayBuffer #31

Open
jamesdbrock opened this issue Jun 24, 2021 · 3 comments
Open

encodeVarint64 performance improvement preallocate ArrayBuffer #31

jamesdbrock opened this issue Jun 24, 2021 · 3 comments

Comments

@jamesdbrock
Copy link
Member

In encodeVarint64, instead of creating a separate ArrayBuffer for each byte, we could preallocate an ArrayBuffer with the right number of bytes for the entire varint by examining its value. And the same for encodeVarint32.

https://github.com/xc-jp/purescript-protobuf/blob/b4ddb67b72b4a3db8b5e5f6ec046182a802e18aa/src/Protobuf/Encode.purs#L311-L321

@jamesdbrock
Copy link
Member Author

case n of
  _ | n < 128 -> do
    Builder.putUint8 n
  _ | n < 16384 -> do
    buf <- ArrayBuffer.empty 2
    dv <- DataView.whole buf
    DataView.setInt8 dv 0 $ contGroup $ n `and` u0x7F
    DataView.setInt8 dv 1 $ (n `zsh` 7) `and` u0x7F
    Builder.putArrayBuffer buf
  _ | n < 2097152 -> do

@jamesdbrock
Copy link
Member Author

jamesdbrock commented Jun 24, 2021

Just for fun

What if we had a set of builder primitives which returned

  • The byte size required to write the primitive
  • A function which takes an offset to an ArrayBuffer and writes the primitive.
type Putter :: Tuple ByteSize (ByteOffset -> ArrayBuffer -> Effect Unit)
putInt8 :: Int -> Putter
putStringUTF8 :: String -> Putter
putFloat64 :: Number -> Putter

Then we could build an ArrayBuffer in two passes, without ever allocating any temporary ArrayBuffers:

  1. Collect the Putters, add their sizes, allocate the final ArrayBuffer.
  2. Call the writer function of each Putter to fill the ArrayBuffer.

Of course we would have to allocate all of these temporary Putters, which is certainly more expensive than allocating the temporary ArrayBuffers.

Maybe there is a way to carry the Putter in a monad and have two different ways to exec the monad, one to calculate ByteSize and one to write to an ArrayBuffer?

Or something like this monoidal append?

<> :: Putter -> Putter -> Putter
Tuple sl wl <> Tuple sr wr = Tuple (sl + sr) (\o ab -> wl o ab *> wr (o + sl) ab)

The thing is that, in cases like encoding a UTF8 string, if you want to know the required size of the encoded string, you have to walk the whole string and do the encoding in order to calculate the size. Won't it always be faster to just write the encoding to a temporary buffer while you're at it, and then copy that temporary buffer later?

I guess in the case of encoding a UTF8 string we could encode the temporary buffer and enclose it with the writer function which, when called, just memcpys the temporary buffer. I'm not saying that's a good idea. But it's a thing we could do.

@jamesdbrock
Copy link
Member Author

jamesdbrock commented Jun 24, 2021

Just for fun

What we want to do is read a ByteSize in context r and write c in context w, kind of like this:

https://hackage.haskell.org/package/codec-0.2.1/docs/Control-Monad-Codec.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant